From Listening Device to Helpful Partner

by Eric Lord & Rahul Bhargava on February 23, 2021

Have a Google Home? Try asking it “can I trust you?”. The response isn’t particularly comforting - a corporate statement about how it’s designed to keep your information secure and a link shared to your phone so you can review the data it has. I followed that link, and found my two most recent saved commands were recorded by accident and contained parts of conversations I had with others!

When the Google assistant thinks it hears its name, it comes to life and listens, regardless of whether or not you actually summoned this device. Now that we can peer into this data collection process and see its flaws, it is up to us to decide if we can trust it. This experience prompted me to think about how we relate to the digital devices around us. How can that help us rethink the particular forms of listening devices like Google Home? Continuing an earlier thread of rethinking the design of these devices, this blog post explores what we know about how people interact with the present, and future, of digital assistants.

screenshot showing design Source: The Logical Indian

Evolution has given humans some pretty useful tools to deduce emotion and intent behind human actions, including being able to tell if someone is listening to us or not. The cues that one gives-off when actively listening to another person are completely absent in the design of these modern listening devices, and a little light doesn’t properly fill that void. Sherry Turkle writes in “In Good Company” that a robot imitating human gestures and movements pushes our “Darwinian buttons” and allows for the robot to “exhibit the kinds of behavior people associate with sentience, intentions, and emotions”. Receiving these social cues that we subconsciously crave actually improves the functionality of these listening devices. When the human brain sees a device exhibit some signs of social competency, it switches to a mode of conversation that is more two-sided and doesn’t feel like you’re asking a question to a blob on your counter. This improvement could lead to more efficient ways of communicating with your device and can even cause some people to use it more often.

There are other non-visual ways that the conversations we have with these devices can be improved. Researchers in the field of Affective Computing are learning and implementing new ways people can communicate with robots and services to make them more efficient and accessible. With affective computing, devices can interpret human emotion and more effectively respond to human requests and imitate natural, comfortable conversation. In addition to making these interactions more efficient, they also allow people with social disorders who would otherwise be unable to interact with these devices to have an effective dialogue with these devices. On the flip side, the addition of more sensors to computational devices opens up an even wider world of data collection and surveillance. Last year Amazon announced their “Halo” wrist band, which among other things can eavesdrop on the tone of your voice to guess if you are feeling happy or sad.

Conversational Technology

Without proper feedback during conversation, there is a disconnect in our brains and we have a harder time seeing these listening devices in the role they are trying to play- social robots. As AI companies try to inch closer to being efficient enough to actually show intelligence, being social is seen as a solution that allows for devices to mimic intelligence without actually being emotionally intelligent. A device that is able to properly pantomime interaction could be more effective at its job since the human brain’s most effective method of communication is audible conversation with another human.

In “Anthropomorphism and the Social Robot”, Brian R Duffy argues that the “ultimate goal of the human–computer interface should be for the interface to ‘disappear’”. This assertion builds on a larger set of beliefs the main creators of these interfaces share - that computation should fade away and simply be part of our surroundings. Groups whose members are at risk from expanding surveillance architectures vocally argue this point.

triangle of representation - abstract/human/iconinc Source: “Anthropomorphism and the Social Robot”

While we now interface with listening devices without touching anything, speaking to one isn’t exactly the seamless interaction it could be. You can’t always ask these devices questions the same way you would ask a friend, these devices don’t understand context and can have a hard time with follow-up questions, and you may even find yourself talking in a different, more robotic voice while using one. When I talk to my Google Home, I often feel like I’m carefully stringing together the right words to get a relevant answer. It feels stunted and awkward. Making the user feel as if they are having a conversation with the device instead is how we turn these listening devices into social robots. Our hypothesis is that creating more an embodied device, which cues the intent to have a conversation with a human, is a different path forward than simply having the devices “disappear.”

Embodiment

If you’ve ever sent a sarcastic text or email to someone and it doesn’t translate properly, you know how it’s difficult to express certain emotions and inflections without the other party being able to see you. Humans also have the ability to anticipate actions and words while in conversation. This allows us to begin formulating responses before the other party finishes and we can also notice when someone else is about to start talking or take a pause. Being able to sense and anticipate the next thing in a conversation allows everything to seamlessly flow together and a conversation becomes a smooth back-and-forth, not a jagged request then response. These skills are essential, and many of those without them suffer throughout their daily lives.

Listening devices don’t have these additional perceptual affordances. They are unable to notice the nuance of human speech or observe our body language. In the future this may be possible and we’d guess that would help smooth these interactions. Physically embodying the equivalent of a nod, or an “uh huh”, are approaches to explore more.

Continued Exploration

This research and background into perception and human-technology relationships informs our work rethinking the design of listening devices. Our goal is to create a device that not only reveals that it is listening to you through anthropometric means, but also stimulates a smoother conversation through feedback. We hope this work eventually delves into questions of perceptual capabilities, but for now it is fun to consider the possible future design of a device that acknowledges and embraces its role as an omnipresent social robot more fully.

Eric Lord

Eric Lord is an industrial designer from South Florida. He received his bachelor of science degree from Virginia Tech's College of Architecture and Urban Studies in 2019 and is currently working towards his Masters in Experience Design at Northeastern University. His primary interests lie in consumer products, kitchenware, and footwear.

🌎 - https://lorderic.myportfolio.com on the web

Rahul Bhargava

Assistant Professor, Journalism and Art + Design, Northeastern University

📨 - r.bhargava@northeastern.edu on email
🐦 - @rahulbot on Twitter
🌎 - https://rahulbotics.com on the web

Filed Under » internet-of-things, design