Skip to Content
Designing Voice User Interfaces
book

Designing Voice User Interfaces

by Cathy Pearl
December 2016
Intermediate to advanced
278 pages
6h 21m
English
O'Reilly Media, Inc.
Content preview from Designing Voice User Interfaces

Chapter 4. Speech Recognition Technology

WE’VE TALKED ABOUT MANY of the crucial voice user interface (VUI) design elements. So far, they’ve been light on the technical details of speech recognition technology itself. This chapter gets more technical, looking under the hood so that you can make sure your VUI design takes into account (and takes advantage of) the technology itself. It will also give you the ability to confidently reference the underlying technology when explaining your design decisions.

To create a VUI, your app must have one key component: automated speech recognition (ASR). ASR refers to the technology by which a user speaks, and their speech is then translated into text.

Choosing an Engine

So how to choose your ASR tool? There are free services as well as those that require licensing fees. Some offer free use for development but require payment for commercial use.

As of this writing, there are two major fee-based speech recognition engines: Google and Nuance. Other options in this space include Microsoft’s Bing and iSpeech.

Free ASR tools include the Web Speech API, Wit.ai, Sphinx (from Carnegie Mellon), and Kaldi. Amazon has its own tool, but at the moment it can be used only when creating skills for the Amazon Echo (which is free).

Wikipedia has a much more detailed list, which you can access at https://en.wikipedia.org/wiki/List_of_speech_recognition_software.

Some companies offer multiple engines, as well; for example, Nuance has different offerings depending on what ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Designing Gestural Interfaces

Designing Gestural Interfaces

Dan Saffer
3D User Interfaces

3D User Interfaces

Joseph J. LaViola Jr.
Design It!

Design It!

Michael Keeling

Publisher Resources

ISBN: 9781491955406Errata Page