7 End-to-End Speech Recognition Models

DOI: 10.1201/9781003348689-7

Learning Outcomes

After reading this chapter, you will be able to:

  • Identify the appropriate end-to-end ASR system for your downstream applications.
  • Understand the basic concepts behind end-to-end ASR and online streaming ASR system.
  • Identify the right open-source ASR model for inferencing.

7.1 End-to-End Speech Recognition Models

Before the rise of deep learning, conventional ASR models were complex systems that included acoustic models, language models, and pronunciation models. The main drawbacks of these was that every modules make assumptions on probability distributions. For instance, n-gram language model and HMMs make strong Markovian independence assumptions ...

Get Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.