Can your content speak for itself? Introducing Voice Content and Usability from A Book Apart

May 5, 2021

Want to skip this to know more about Voice Content and Usability, A Book Apart’s first voice book? You can go straight to what’s in the book or sign up for preorders.

Voice interfaces still represent a journey riddled with obstacles and challenges, because for the last two decades, we’ve by and large been operating within the confines of visual interfaces. Building a voice interface for users accustomed to physical and visual interaction with their devices introduces unique aural and verbal challenges to web-biased disciplines like content strategy, design, information architecture, and usability testing.

Content today takes many different forms in our ever-changing efforts to reach every customer on every device in just the right ways—like immersive experiences. But for the most part, the text and media we use to transmit information down the wire in customer experiences revolve around the written word and its visual underpinnings. Only recently have we begun to consider the impact of delivering content aurally through voice interfaces and the spoken word. Let me ask you:

Does your content speak for itself?

Teams around the world are grappling with the growth in voice assistants, smart speakers, and voicebots, but our content strategy and design techniques haven’t kept up. When it comes to voice content, content delivered through voice interfaces, it’s clear that we as architects, copywriters, designers, and developers have a lot of catching up to do.

Voice content demands a completely different approach, and it forces us to play in the sandbox not of visual screens and handheld gadgets but rather in the ancient vernacular and instinctual habits of our most primordial practice: natural human conversation. Instead of us contorting our brains to type on keyboards and move cursors around, it’s machines that have to do the heavy lifting to chat with us. In some ways, it’s an advantage. In others, it just reveals how much our content strategy and design tactics remain rooted in the visual and artificial rather than the verbal and organic.

Fortunately, my new book Voice Content and Usability, the first voice title from the publisher A Book Apart, is here to help. From understanding the nuanced distinctions between spoken and written interfaces to authoring and executing voice content from the standpoint of existing web content, my book will have something for every content practitioner and designer new to voice design or seasoned in the ways of voice interfaces. You can sign up for preorders or read on for more.

Voice interfaces aren’t like other conversational interfaces

We all have a natural understanding of what a conversation is. It’s a social means of communication between two or more people that straddles a spectrum between small-talk banter on the one end and momentous decision-making on the other. It’s a context-dependent joint activity that builds consensus, conveys social cues, trawls for information, and guides transactions. Many of the conversations we have with each other, especially these days, are written, like our text messages, WhatsApp groups, and Slack channels. There’s a visual foundation for our conversations.

But spoken language has no such luxury, as Michael McTear, Zoraida Callejas, and David Griol explain in The Conversational Interface. Some of the complexity in handling human speech involves actions like grounding, in which two people who may not have understood each other clearly engage in the less useful but highly necessary clarification of what was said. And apart from the nonverbal cues that decorate conversations with emphasis and emotional context, there are also verbal cues and vocal behaviors known as paralinguistic phenomena: how something is said, not what. Our spoken language conveys much more than the written word could ever be capable of.

So what exactly makes up not just a good conversation, but a good spoken one?

Without the identifiable motifs and tropes found in human conversation, voice interfaces that don’t reflect how people actually converse with others can’t succeed. “Conversation is not a new interface,” writes Conversational Design author Erika Hall. “It’s the oldest interface.” The quandary of how to humanize machine utterances reaches far beyond linguistic questions like how computers produce recognizable speech and how synthesized speech ought to sound. It also concerns how to strike a balance between faithfully approximating authentic spoken, not written, conversation and efficiently bringing the user to their goal.

Though they may seem distant from the algorithms and rules that govern speech synthesis or natural language processing (NLP) in voice interfaces, human conversation has its own laws refined through millennia of societal honing. Voice designer Cathy Pearl argues in Designing Voice User Interfaces that spoken human conversations can be identified based on certain baseline requirements that even the most casual of across-the-bar chats share: contextual awareness and a sense of the interlocutors’ surroundings, a working memory of previous conversations, and a turn-based back-and-forth.

When architecting voice interfaces, it’s essential to remember that the laws governing written language are leagues away from the rules outlining spoken language. And when our interfaces need to deploy voice content, the challenge becomes more striking still.

Why voice content?

These days, voice interfaces seem to be ubiquitous. Many of us now have smart speakers or Alexa devices at home to run errands for us or play that favorite song. But the vast majority of these user experiences remain transactional, not informational. That means they’re excellent at checking our credit card balances for us or ordering pizzas on our behalf, but they’re not so great at discussing Star Trek: The Motion Picture with us or answering our questions about how to register to vote.

Voice content is still largely unexplored territory when it comes to widely covered areas like accessibility, content strategy, and usability testing. Though there are now dozens of books in the crowded landscape of conversational design about chatbots and voice interfaces, none of those focuses uniquely on the strategies and tactics needed to architect and implement content-driven, or informational, voice interfaces. For many organizations responsible for distributing information rather than conducting transactions, voice interfaces have long been a no-go.

With Voice Content and Usability, here’s how that changes and what you can learn by reading my new book:

Conversations with computers. Voice differs in crucial ways from the larger conversational landscape when it comes to the interfaces in question and key differences between spoken and written content. Learn what voice interactions are and how informational and transactional voice interfaces differ.
Getting content ready for voice. Content-first design approaches require making sense of a content model usually well-suited for the web but typically less so for conversation, and even less so for voice. Content legibility and discoverability are key to successful voice-driven content.
Crafting dialogues. Unlike visual and manual interfaces, interfaces that deliver voice content require different approaches to feedback, error handling, interface text, and help text—in voice, good design means good writing.
Diagramming flows. Interfaces that work with voice content need to expose it in discoverable ways by leveraging aural, not visual, approaches to wayfinding rooted in habitable and linear flows.
Reading voice content for launch. Unlike chatbots and other conversational interfaces, voice content requires unique approaches to usability testing that challenge our foregoing web-based approaches to preparing for launch.
The future of voice content. Accelerating innovation in voice experiences is opening the door to new possibilities in accessibility, inclusive design, voice assistants and other interfaces, and the very notion of what content is and how we deliver it.

This is just a short summary of what’s in Voice Content and Usability, with real-world case studies rooted in one of the first-ever voice interfaces for content: the Ask GeorgiaGov Alexa skill built for residents of the state of Georgia.

Conclusion: How to get a copy

Throughout the coming months, in the weeks leading up to the launch of my book on June 22nd, I’ll be sharing more insights about voice content and why content strategists and designers everywhere should prepare their content for a digital channel that shows no signs of going away

Later this month, preorders will be opening on the A Book Apart website, but in the meantime, see my book’s website and sign up to be the first to know about how you can get a copy by subscribing to my newsletter. Also, follow me on LinkedIn and Twitter to keep track of my latest articles, and check back on my blog for more never-before-seen insights about voice content and the promise it holds for your team.

Don’t believe me? By the time you finish my book, your content will speak for itself.

preston.so