The world of conversational interfaces is very young. Here are some early questions that it’s working out.
Bots have become hot, fast. Their rise—fueled by advances in artificial intelligence, consumer comfort with chat interfaces, and a stagnating mobile app ecosystem—has been a bright spot in an otherwise darkening venture-capital environment.
I’ve been speaking with a lot of bot creators—most recently at a conference called Botness that took place in San Francisco at the beginning of June—and have noticed that a handful of questions appear frequently. On closer inspection, bots seem a little less radical and a lot more feasible.
Text isn’t the final form
The first generation of bots has been text most of the way down. That’s led to some skepticism: you mean I’ll have to choose between 10 hotels by reading down a list in Facebook Messenger?! But bot thinkers are already moving toward a more nuanced model in which different parts of a transaction are handled in text and in graphical interfaces.
Conversational interfaces can be good for discovering intent: a bot that can offer any coherent response to “find a cool hotel near Google’s HQ” will be valuable, saving its users one search to find the location of Google’s headquarters, another search for hotels nearby, and some amount of filtering to find hotels that are “cool.”
But, conversational interfaces are bad at presenting dense information in ways that are easy for human users to sort through. Suppose that hotel bot turns up a list of finalists and asks you to choose: that’s handled much more effectively in a more traditional-looking web interface, where information can be conveyed richly.
Conversational interfaces are also bad at replacing most kinds of web forms, like the pizza-ordering bot that has ironically become an icon of the field. Better to discern intent (“I want a pizza fast”) and then kick the user to a traditional web form, perhaps one that’s already pre-filled with some information gleaned from the conversational process.
A few people have pointed out that one of WeChat’s killer features is that every business has its phone number listed on its profile; once a transaction becomes too complex for messaging, the customer falls back on a phone call. In the U.S., that fallback is likely to be a GUI, to which you’ll be bounced if your transaction gets to a point where messaging isn’t the best medium.
Discovery hasn’t been solved yet
Part of the reason we’re excited about bots is that the app economy has stagnated: “the 20 most successful developers grab nearly half of all revenues on Apple’s app store,” notes The Economist. It’s hard for users to discover new apps from among the millions that already exist, and the app-installation process involves considerable friction. So, the reasoning goes, bots will be great because they offer a way to skip the stagnant app stores and offer a smoother “installation” process that’s as simple as messaging a new contact.
Of course, now we’ve got new app stores like Slack’s App Directory. Users are still likely to discover new bots the way they discover apps: by word of mouth, or by searching for a bot associated with a big brand.
The next step, then, would be to promote bots in response to expressions of intention: in its most intrusive implementation, you’d ask your coworkers on Slack if they want to get lunch, and Slack would suggest that you install the GrubHub bot. Welcome back Clippy, now able to draw from the entire Internet in order to annoy you.
That particular example is universally condemned, and anything that annoying would drive away its users immediately, but the community is looking for ways to listen for clear statements of intent and integrate bot discovery somehow, in a way that’s valuable for users and not too intrusive.
Platforms, services, commercial incentives, and transparency
Conversational platforms will have insight into what users might want at a particular moment, and they’ll be tempted to monetize these very valuable intent hooks. Monetization here will take place in a very different environment from the web-advertising environment we’re used to.
Compared to a chat bot’s output, a Google results page is an explosion of information—10 organic search results with titles and descriptions, a bunch of ads flagged as such, and prompts to modify the search by looking for images, news articles, and so on.
A search conducted through a bot is likely to return a “black box” experience: far fewer results, with less information about each. That’s especially true of voice bots—and especially, especially true of voice bots without visual interfaces, like Amazon’s Alexa.
In this much slower and more constrained search environment, users are more likely to accept the bot’s top recommendation rather than to dig through extended results (indeed, this is a feature of many bots), and there’s less room to disclose an advertising relationship.
Amazon is also an interesting example in that it’s both a bot platform and a service provider. And it has reserved the best namespace for itself; if Amazon decides to offer a ridesharing service (doubtless after noticing that ridesharing is a popular application through Alexa), it will be summoned up by saying “Alexa, call a car.” Uber will be stuck with “Alexa, tell Uber to call a car.”
Compared to other areas, like web search, the messaging-platform ecosystem is remarkably fragmented and competitive. That probably won’t last long, though, as messaging becomes a bigger part of communication and personal networks tend to pull users onto consolidated platforms.
How important is flawless natural language processing?
Discovery of functionality within bots is the other big discovery challenge, and one that’s also being addressed by interfaces that blend conversational and graphical approaches.
Completely natural language was a dead end in search engines—just ask Jeeves. It turned out that, presented with a service that provided enough value, ordinary users were willing to adapt their language. We switch between different grammars and styles all the time, whether we’re communicating with a computer or with other people. “Would you like to grab lunch?” in speech flows seamlessly into “best burrito downtown sf cheap” in a search bar to “getting lunch w pete, brb” in an IM exchange.
The first killer bot may not need sophisticated NLP in order to take off, but it still faces the challenge of educating its users about its input affordances. A blank input box and blinking cursor are hard to overcome in an era of short attention spans.
Siri used a little bit of humor, combined with a massive community of obsessed Apple fans bent on discovering all of its quirks, to publicize its abilities. Most bots don’t have the latter, and the former is difficult to execute without Apple’s resources. Even with the advantages of size and visibility, Apple still hasn’t managed to get the bulk of its users to move beyond Siri’s simplest tasks, like setting alarms.
(Developers should give a great deal of thought to why alarm-setting is such a compelling use case for Siri: saying “set an alarm for 7:30” slices through several layers of menus and dialogues, and it’s a natural phrase that’s easily parsed into input data for the alarm app. Contrast that with the pizza-ordering use case, where you’re prompted for the type of pizza you want, prompted again for your address, prompted again for your phone number, etc.,—far more separate prompts than you’d encounter in an ordinary pizza-ordering web form.)
Another challenge: overcoming early features that didn’t work well. We’ve all gotten used to web software that starts out buggy and improves over time. But we tend not to notice constant improvement in the same way on bots’ sparse interfaces, and we’re unwilling to return to tasks that have failed before—especially if, as bots tend to do, they failed after a long and frustrating exchange.
What should we call them?
There’s not much confusion: people working on bots generally call them bots. The field is young, though, and I wonder if the name will stick. Bots usually have negative connotations: spambots, Twitter bots, “are you a bot?”, and botnets, to name a few.
“Agent” might be a better option: an agent represents you, whereas we tend to think of a bot as representing some sinister other. Plus, secret agents and Hollywood agents are cool.