Personality for voice interfaces: Humanizing the most human of user experiences

June 22, 2021

My newest book Voice Content and Usability is now available! Want to know more about A Book Apart’s first-ever title on voice interface design? Buy the book, learn what’s inside, and subscribe for more insights like this.

As I write in my newest book Voice Content and Usability, when we design voice interfaces, we need to consider not only the mechanisms that machines use for spoken language but also the narrative nature of human conversation. This isn’t just because voice interfaces are different in crucial ways from written conversational interfaces. It’s also because human speech introduces a significant new foible into our interfaces that mostly visual user experiences have had to deal with only in abstract senses: the dual problem of identity and personality.

The vast majority of visual and physical interfaces around us—computers and televisions, kiosks and devices—seldom impose a human identity on a user experience. But synthesized speech is different, because humans use it to import all of their preconceived notions about the sort of person whose identity is being intentionally or unintentionally cultivated. Because they approximate human identities, voice interfaces need to engage in personification that can introduce significant complexity to your design process.

In this article, I dive into some of the perils and promises behind giving our most human of interfaces personalities of their own. There’s much more about this realm of user experience in my newest book Voice Content and Usability, which is now available at A Book Apart, my publisher’s first-ever title on voice, and the industry’s first book on voice content. If you’re not sure whether the book is for you, you can see what’s inside, look at my previous content on voice, and subscribe to my newsletter.

The hidden cost of humanizing voice interfaces

Establishing a human identity can give life to normally sterile and austere interfaces like IVR systems and voice interfaces, but it comes at a cost. Unfortunately, the sexism inherent to how we treat or think of executive assistants in our society extends to how we approach voice interfaces as well. As Michael Cohen, James Giangola, and Jennifer Balogh found, the human identity cultivated by the creators of a voice interface is critical to its reception among users, for reasons outside the voice interface’s control:

In usability studies of a voice-browser application, callers expressed a sense of loyalty to the professional, matter-of-fact, but personal and familiar character representing the system—an administrative assistant in her early 40s. Users felt comfortable entrusting to her their most sensitive personal information, such as credit card details. But a well-designed, likeable persona is essentially an illusion, a mosaic built up of consistent, conversational messages that collectively suggest a coherent, sociolinguistically familiar presence.

The illusion that Cohen and his collaborators speak of mirrors the conjunct of biases and systems of oppression that prejudice us in favor of one type of voice assistant over another. It’s essential to recognize that particular traits applied to speech synthesizers to establish identity may in fact intensify and perpetuate the marginalization that many people, including the oppressed people in your user base, face on a daily basis in society. I write much more about the issue of inclusion and equity in my book Voice Content and Usability, available now.

Writing dialogues that grant personality

Voice interfaces encompass not only the core linguistic behaviors that make up spoken language but also the nonverbal paralinguistic features that communicate subtext and meaning beyond the core message itself. Both overt linguistic and covert paralinguistic cues are essential to crafting a human identity for your voice interface and instilling a sense of a soul and feelings where none exist. But today, many voice assistants still lack the customizability and flexibility for designers to adjust behaviors beyond the spoken word itself.

So let’s start with the levers of speech that we can control most flexibly: the words we use to write dialogues. As voice interface designer James Lewis writes in Practical Speech User Interface Design, “Accepted standards for terminology, formality, and interaction style in spoken communication vary based on demographic factors such as culture and socioeconomic status, as well as the subject matter or purpose of the conversation.” How we write dialogues for our voice interfaces should reflect the typical human conversation we are hoping to approximate

Large company (formal register and direct, businesslike tone) Welcome to WakandaJet, where the safety of our customers and crew is our first priority.

Small business (family-owned restaurant, colloquial register, and gregarious tone) ¡Bienvenidos! and welcome to Café Oaxaca, where we treat you like family! We’re glad you’re here!

Is your voice content intended to come close to the experience of chatting with a hotel concierge, a gate agent at the airport, or a trusted health provider? Or is your voice content meant to be conveyed by a youthful, casual tour guide or a slangy restaurant host? Markers transmitting these important cues include honorifics (polite forms of address that can exclude non-binary folx like madam or sir), words that evoke different registers (informal y’all as opposed to formal valued guests), or even regionalisms that might be appropriate for a more local audience (e.g. using bienvenidos to greet visitors who code-switch between English and Spanish instead of welcome).

How we humanize our voice interface also involves paralinguistic levers governing consciously or subconsciously detected subtext or emotion: how it’s said rather than what is said. If you are able to manipulate deeper characteristics of your voice interface at the speech synthesizer level, you can adjust aspects such as “pitch, speech rate, voice quality, and loudness” as well as emotional behaviors like “sighs, gasps, and laughter,” write Michael McTear, Zoraida Callejas, and David Griol in The Conversational Interface. It’s important, however, to use such traits sparingly, especially as you broaden your audience, because certain traits found in socially acceptable speech in one demographic, like sarcasm or loud speech, might be anathema to another.

Assigning personality to voice interfaces

The notion of a voice interface’s personality is deeply enmeshed with a voice interface’s identity. Erika Hall defines personality in Conversational Design as “the consistent set of human characteristics embodied by your product, service, or organization.” These are borne out by linguistic and paralinguistic cues as well as more subtle traits like intimacy, familiarity, and humor.

Because voice interfaces are fundamentally soulless automata, we need to inject them with a sense of personality that will entice users to keep coming back. But as Joscelin Cooper notes, even if interface text faithfully mimics the “lilt, flow, and syntax of human speech,” there is always a risk of creating a “false intimacy that distances even as it attempts to foster familiarity.” A voice interface displaying overfamiliarity can come across as creepy or threatening, often because of the uncanny valley effect that makes us feel that overly humanized interfaces are eerie and aloof.

Overfamiliarity Welcome to WakandaJet, where we treat our passengers like family! Can you give me your first and last name to get started?

The same goes for humor; as chatbot designer Eunji “Jinny” Seo writes, “There is a fine line between cute and annoying, or even dangerous … It’s one thing to be funny, but find ways to do it without risking user trust.”

Dangerous humor Your flight was cancelled. Just kidding! It’s on time and leaving out of Gate E4.

Conclusion

Voice interfaces juggle concepts that are wholly unfamiliar to humans engaging in normal conversation: prompts, intents, and responses that adhere to a clear sense of inputs and outputs. But it’s also important to give due consideration to the human triggers that decorate every organic conversation: the discourse markers that link strands of dialogue and the key moments that help to compartmentalize conversations into narrative structures: introduction, orientation, action, and guidance.

Our voice interfaces are capable of confirming or challenging our deeply held biases about other people. Today’s audiences for voice interfaces are increasingly attuned to the need for representation of the marginalized and underrepresented: bilingual or dialectal code-switching, colloquialisms used among people of color and queer communities, and speech synthesizer customization. Because voice interfaces are the most human of all digital experiences, we must respect and acknowledge the humanity of those we aim to reach.

Voice interface design is an uneasy fusion of two distinct design approaches: the dialogues that underpin the voice interface’s narrative, its rhetorical turns of phrase, and its propulsive force toward the appropriate content for the user—and the flows along which users traverse conversational interfaces, like rivers etching their way down watercourses. For a full discussion of how dialogues and flows intersect with one another in complex and nuanced ways, buy my newest book Voice Content and Usability, available now.

For more insights like this, buy my newest book Voice Content and Usability from A Book Apart, check out what’s in the book, and subscribe to my newsletter.

preston.so