Chapter 16. Cloud Speech: audio-to-text conversion

This chapter covers

  • An overview of speech recognition
  • How the Cloud Speech API works
  • How Cloud Speech pricing is calculated
  • An example of generating automated captions from audio content

When we talk about speech recognition, we generally mean taking an audio stream (for example, an MP3 file of a book on tape) and turning it into text (in this case, back into the actual written book). This process sounds straightforward, but as you may know, language is a particularly tricky human construct. For instance, the psychological phenomenon called the McGurk effect changes what we hear based on what we see. In one classic example, the sound “ba” can be perceived as “fa” so long as we see someone’s ...

Get Google Cloud Platform in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.