Building usable conversations: How to approach conversational interfaces

January 18, 2019

I wrote a book about voice content and how to build content-driven voice interfaces. Voice Content and Usability is A Book Apart’s first-ever voice design book. You can learn more about what’s in the book or sign up for preorders.

This is part of a series of articles on conversational content strategy, with installments about information architecture, design, content strategy, usability testing, and Ask GeorgiaGov, the first voice interface for residents of the state of Georgia. Reprinted from the Acquia Developer Center with permission from DC Denison (Senior Editor, Acquia).

To kick off 2019 properly, the Experience Express is taking a break from Drupal and web development to consider an oft-forgotten component of new digital experiences in the conversational space. Though many organizations, some of Acquia's customers included, have leapt headlong into building conversational interfaces, sometimes it can be difficult in such a newfangled paradigm to consider all possible angles where things can go awry.

Conversational interfaces, and especially voice-driven interfaces, are the "oldest interface" — as Conversational Design author Erika Hall writes — and the most human of all user interfaces we use today. As a result, there are particular characteristics in conversation and voice that can make or break your conversational experience even before it has launched. As we've seen countless times with digital experiences, code is often secondary to design, usability, and other elements that we now take for granted when it comes to building websites.

A survey of conversational interfaces

"We should be able to use the same principles to make our digital systems and intuitive to use by finally getting the machines to play by our rules." —Erika Hall, Conversational Design

Many of today's user interfaces have the lofty goal of reducing friction to zero. Consider for instance the act of ordering a pizza, something that today is a painless process. Previously, there were always elements of friction in the experience of ordering a pizza. Ordering on a pizzeria's website forces us to use a keyboard and mouse, which are artificial and learned interfaces. The same occurs when we decide to order on a smartphone application. But ordering a pizza through a more human conversation — that does more to reduce friction than any evolution in existing user interfaces in our smartphones and laptops.

As Erika Hall writes, "The ideal interface is an interface that's not noticeable at all." And today, there is a dizzying array of conversational interfaces available, all with the goal to present the user with as little friction as possible to accomplish their tasks. Whereas some conversational interfaces are monochannel (or hardware-specific, like Amazon Alexa or Google Home), others are omnichannel (or channel-agnostic, like chatbots that can be used on multiple devices). Nonetheless, the emergence of channel-agnostic frameworks that integrate with multiple conversational interfaces, including Dialogflow, Amazon Lex, Wit.ai, IBM Watson, and others, is blurring the previously clear boundary between monochannel and omnichannel interfaces.

In addition to this dichotomy, conversational interfaces also have varying levels of flexibility when it comes to developer experience. Monochannel conversational interfaces that are tightly coupled to hardware, such as Apple Siri and Microsoft Cortana, tend to offer developers less ability to customize responses or interpret users' intentions. Meanwhile, more flexible APIs and approaches can be found in voice assistants that place a high premium on developer-created content, like Amazon Alexa and Google Home.

The currency of a conversation

Now that we have delved into the technological underpinnings of the conversational interface ecosystem, we can now turn to the elements that make a conversation compelling for users. Although we will also concern ourselves with conversational design and some of its key principles, in this section I focus solely on the types of conversations that exist in the real world and how we handle them.

Consider the following two conversations, in which Sabina and Alex are both interested in pizza from a pizzeria called Mariano's:

Conversation A

Sabina: "One cheese pizza, please."
Mariano's: "Sure! What size would you like? Would you like the combo deal?"
Sabina: "Medium please, no combo!"

Conversation B

Alex: "Tell me about your ingredients. What's your halal policy?"
Mariano's: "We are halal-certified by the American Halal Foundation and have separate cooking areas."
Alex: "What kind of cheese do you use?"

What is the difference between these two simple conversations? The primary distinction comes down to what the subject seeks: either a fulfillment of an order or the dispensing of information. For instance, in Conversation A, Sabina explicitly makes an order that facilitates a transaction that is favorable for her. Meanwhile, in Conversation B, Alex is requesting information rather than executing a transaction.

This highlights the primary dichotomy that we see in normal conversations with people we don't know well. While many conversations that we hold in businesses we patronize are transactional in nature because they precede an exchange of goods or services, other conversations are informational because we seek information that enable us to make better decisions. In the table below, we can identify how each of these conversations differs:

Conversation A (transactional)

Sabina: Intent
Mariano's: Filtering
Mariano's: Fulfillment
Sabina: Intent

Conversation B (informational)

Alex: Query
Mariano's: Content search
Mariano's: Fulfillment
Alex: Query

As you can see, the components that comprise each of these conversations differ considerably. Whereas Sabina fulfills an intent by embarking on a transactional conversation, Alex instead answers his queries by pursuing an informational conversation. In each case, Mariano's is responsible for fulfilling the request, either by winnowing down the options or conducting a search for information within the content Mariano's has available.

Here is a final example of two conversations that demonstrates how transactional and informational conversations differ from one another:

Conversation C (transactional)

Trey: "Two tickets for Solo please."
Cinema: "Sure! Would you like tickets for 12:30pm, 2:30pm, or 5pm?"
Trey: "12:30pm."

Conversation D (informational)

Abdirahim: "When are you open this weekend?"
Mall: "We open at 10am on Saturday and 11am on Sunday."
Abdirahim: "What movies are playing now at the cinema?"

When we look at existing examples of conversational interfaces in the wild, we can see that many of the chatbots we use every day adhere to these paradigms. Many conversational interfaces are only positioned for transactional conversations and cannot hold informational conversations. Others dispense information but cannot fulfill a user's need. Still others can hold both types of conversations.

Web content versus conversational content

Because developers reading this column are primarily concerned with Drupal, content management, and funneling content into conversational interfaces, we take a short detour to consider the historical background and ramifications of the gulf between web-based content and conversational content. While spoken web content originally appeared as a result of the growing emphasis on accessibility in recent years, today it takes a variety of forms.

The history of spoken content is a complicated and colorful one. In the early 2000s, text-to-speech (TTS) services that dictated web pages to users with disabilities over a telephone were commonplace and remain widespread today. Meanwhile, assistive technologies such as screen readers became ubiquitous in the intervening years, though their user experiences are often complex and suffer from frequent interruptions that disrupt the flow of content.

Both text-to-speech services and screen readers suffer from what I call a low verbosity tolerance. When reading a particularly long-winded passage of text, we can vary the speed at which we read the prose before us. On the other hand, when listening to the same text, our attention spans suffer considerably due to our lack of control over how that text is presented to us.

This brings us to the chief differences between web-based and conversational content. Web content tends to be screen-based, longform prose that is link- and reference-rich with a high verbosity tolerance. Meanwhile, conversational content tends to consist of short, individual utterances presented in a limited visual context or in a aural and verbal interface. By necessity, conversational content has fewer links and references.

Whether it is presented in a chatbot or a voice assistant, conversational content has to succeed in presenting information in a comprehensive but efficient way in order to provide an optimal user experience. Armed with the right approaches when it comes to information architecture, design, content strategy, and usability, you can ensure any content is presentable both on the web and on a conversational interface.

Conclusion

In this post from Experience Express, we initiated a journey through the inner workings of conversational interfaces. First, we looked at a conversation and identified some of the types of conversations that we build interfaces for today. We also inspected the spectrum of interfaces available in the conversational landscape and the differences between transactional and informational conversations and between web content and conversational content.

In the next installment of this series, we'll dive into our first area of interest, information architecture. We'll compare some of the ways in which information architecture is conceived on the web and in conversational interfaces and identify some best practices for your own interface. Finally, we'll discuss some of the ramifications of a conversational approach to information architecture that have substantial consequences for your project. All aboard!

preston.so