How to classify interactions for conversational interfaces

June 8, 2021

Preorders are now available for my new book Voice Content and Usability, coming June 22nd! Want to know more about A Book Apart’s first-ever title on voice interface design? Preorder the book, learn what’s inside, and subscribe for more insights like this.

We all have an innate, primordial sense of what a conversation is. As I’ve written before, it’s “a social means of communication between two or more people that straddles a spectrum” between small-talk banter on the one end and decision-making or new information on the other. It’s “a context-dependent joint activity that builds consensus, conveys social cues, trawls for information, and guides transactions.”

The difference between what makes a conversation and what makes a conversational interaction in an atomic sense is a question muddled by the fact that conversations don’t fit into neatly defined parameters besides the ones just mentioned. After all, two people can enter into a conversation that visits and revisits topics, covers any number of needs and issues, and may accomplish multiple things in succession. A human conversation can last a few minutes or even the entire length of a book, like Samuel Beckett’s Waiting for Godot.

As designers working with voice interfaces and chatbots, however, it’s important to draw a clear distinction to avoid interactions that serve no clear purpose. And this single difference—that conversational interactions address and realize one clear outcome, rather than several—is the typical approach that many authors and practitioners use to define a single conversational interaction for the purposes of the interfaces we need to architect and design. In other words, a conversation may be composed of multiple conversational interactions, like Amazon Alexa is capable of, but a single conversational interaction may also represent a single conversation.

In this article, I want to walk through a few key approaches that authors and academics alike have used in the past to classify conversational interactions for the purposes of improving how we design and build both written and spoken conversational interfaces. We’ll first explore a three-way split between prosocial, informational, and transactional interactions before turning our attention to conversational interaction styles. For more insights, consult my new book Voice Content and Usability from A Book Apart, which covers informational interactions at length and is available for preorder or preview.

Classifying conversational interactions

As I wrote in my previous article about Ask GeorgiaGov, we can classify conversational interactions into three categories:

“We interact with voice interfaces for mostly the same reasons we enter into conversations with other people, according to Michael McTear, Zoraida Callejas, and David Griol in The Conversational Interface. Generally, we start up a conversation because we need something done (such as a transaction), because we want to know something (information of some sort), or simply because we’re social animals and want someone to talk to (conversation for conversation’s sake). Respectively, these three categories—transactional, informational, and prosocial—also characterize essentially every spoken interaction with a voice interface.
“Even these days, prosocial conversations are more for fun and games than emotionally captivating and uplifting. That leaves two genres of conversations we can have with one another that a machine can easily have with us too: a transactional exchange realizing some outcome (what chatbot designer Amir Shevat in Designing Bots calls a task-led conversation, e.g. “buy coffee”) and an informational dialogue teaching us something new (what Shevat identifies as a topic-led conversation, e.g. “discuss a movie”). I discuss this key distinction at length in Voice Content and Usability, available for preorder or preview now.”

For more information about Ask GeorgiaGov, incidentally, you can refer back to my previous series on conversational content strategy and, of course, my book Voice Content and Usability, available for preorder and preview and undergirded throughout the text by the Ask GeorgiaGov case study.

Transactional (task-led) interactions

Whether you use a phone call, video call, text message, or chatbot, if you aren’t using a food delivery app, you’re generally having a conversation when you order a Hawaiian pizza with extra pineapple. Even when we walk up to the counter and place an order, the conversation quickly pivots from an initial smattering of neighborly small talk to the real mission at hand: a pizza generously topped with pineapple, as it should be.

Alison: Hey, how’s it going?
Burhan: Hi, welcome to Crust Deluxe! It’s cold out there. How can I help you?
Alison: Can I get a Hawaiian pizza with extra pineapple?
Burhan: Sure, what size?
Alison: Large.
Burhan: Anything else?
Alison: No thanks, that’s it.
Burhan: Something to drink?
Alison: I’ll have a bottle of Coke.
Burhan: You got it. That’ll be $13.55 and about 15 minutes.

Each progressive disclosure in this transactional or task-led conversation reveals more and more of the desired outcome of the transaction: a service rendered or a product delivered. Transactional conversations, and transactional interactions, have certain key traits: they’re direct, to the point, and economical. They dispense with the pleasantries quickly.

Informational (topic-led) interactions

Meanwhile, some conversations are primarily about obtaining information. If you arrived late to a hotel and awoke having forgotten the final deadline to raid the complimentary continental breakfast, you might make a call to reception to find out. Though we have a social mini-conversation at the beginning to establish politeness, we’re after much more.

Chiene: Hi, is this reception?
Dinh: Yes, this is Dinh speaking, how’s your stay so far?
Chiene: Maybe a little too great. I overslept!
Dinh: (laughs) That’s our new bedding. It’s amazing. What can I do for you?
Chiene: I just wanted to know, when does breakfast close?
Dinh: You rang just in time. We’re shutting down breakfast in 15 minutes at 11am.
Chiene: Do you know if there’s a waffle iron there too?
Dinh: Yes, it’s still on.
Chiene: Thanks, I’ll be right there!

This is a very different dialogue. Here, the goal is to get a certain set of facts. These investigative journeys to uncover more of the truth are what characterize an informational or topic-led conversation. Informational conversations might be more long-winded than transactional conversations by necessity, but only insofar as they eventually reach the information that we’re looking for.

This isn’t to say that transactional and informational interactions can’t coexist in the same overarching conversation. Alison could have asked Burhan what beverages are available, and Chiene could have requested that Dinh keep the waffle iron switched on. But at their core, once split to the point of indivisibility, most conversations we can conceivably have with voice interfaces, and each other, are either informational or transactional.

Crafting informational or topic-led interactions is the subject of my book Voice Content and Usability, which is available for preorder or preview.

Conversational interaction styles

Though this taxonomy works neatly for most interactions with conversational interfaces, there is something eerily soulless about reducing our entire capacity for conversation to two seemingly ill-fitting types. After all, among the acknowledged pitfalls of conversational interfaces are their stilted delivery and their uncanny aloofness. In Studies in Conversational UX Design, user experience researchers Robert Moore and Raphael Arar contend that the voice interfaces of today are inadequate for true sustained conversation, precisely because of this rigidity:

“Creating a user interface that approximates [natural conversation] requires modeling the patterns of human conversation, either through manual design or machine learning. Rather than avoiding this complexity, by producing simplistic interactions, we should embrace the complexity of natural conversation because it mirrors the complexity of the world. The result will be machines that we can talk to in a natural way, instead of merely machines that we can interact with through unnatural uses of natural language.”

Moore and Arar go on to enumerate four interaction styles they observed in most modern chatbot and virtual assistants:

In the system-centric interaction style, the interface only interprets queries and commands. In these highly unnatural interactions, the interface merely responds to the user and restarts from the beginning upon the conclusion of each result, and no context is retained across successive interactions.

In the content-centric interaction style, the interface issues responses to frequently asked questions (FAQs) and can only answer individual questions without retaining context across queries. Content-centric interfaces recite lengthy document-like responses and recognize topics but not deeper discussion about them.

In the visual-centric interaction style, the interface is paired with a visual and physical component, typically borrowing elements from web and mobile applications, that is directly manipulable by the user to provide input.

In the conversation-centric interaction style, the interface is capable of natural human conversation and can handle conversation management on its own, namely by understanding and maintaining context throughout the interaction.

While system-centric and content-centric interaction styles in some ways characterize transactional and informational conversations respectively, conversation-centric interactions characterize the sort of idle conversation we might have with Alexa: somewhat faithful to an authentic conversation we might have in passing at the grocery store.

Conclusion

Though we’re not there yet, I examine some of the implications and requirements of conversation-centric design in my new book Voice Content and Usability (preorder or preview), which discusses a content-centric case study in the form of Ask GeorgiaGov, the first voice interface for residents of the state of Georgia. Achieving authentic conversation-centric design is a critical stepping stone on the way to what Mark Curtis calls the conversational singularity, that moment in time—a dream of futurists everywhere—when the currently rehearsed interactions we have with conversational interfaces become indistinguishable from the conversations we have with one another.

But are conversation-centric interactions and the conversational singularity something that interface users actually want? The ethical implications and questions of algorithmic oppression notwithstanding, how true is it really that we want our chats with chatbots and our dialogues with voice interfaces to be as human as possible? Rooted in all of this discourse is, of course, the issue of what it means to be human and to have a mind of one’s own. And you can find some of these answers in my book, available for preorder now.

preston.so