Chapter 5. The Elements of Conversation

With one burst of energy, I can issue a pretty sophisticated directive, such as, “Get me one large turkey hoagie with everything on it, and a small Coke.”

Think about how many steps it would take me to communicate that same command in a graphical user interface, say on an iPhone. I’d have to select the sandwich (turkey hoagie) from a drop-down list of sandwiches, then I would need to select the size of the sandwich (another drop-down maybe), then I would click on the “all” toppings checkbox (assuming this option was offered), and finally I would need to select the drink and its size. That would be at least five distinct steps (and this doesn’t even include the tap(s) for unlocking the app). That’s clearly far more effort than speaking one sentence. In the case of the one spoken sentence, the effort equation is shifted away from the user and onto the interface: the user can speak naturally (they don’t need to use the artificial devices of drop-down menus, checkboxes, radio buttons, etc.) so that the burden is on the voicebot to figure out what the user wants by interpreting the words they speak.

The ability of a voicebot to parse a rich, complex user statement, however, does not obviate the need for the voicebot to engage the user in a stepwise, back-and-forth interaction. Users don’t pack their commands to the hilt just to save steps. They will try to be efficient, but only to a point. When I call to order a pizza, I don’t say in one breath, ...

Get The Elements of Voice First Style now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.