The Basics of Automatic Speech Recognition

Rita Singh1, Bhiksha Raj1, Tuomas Virtanen2

1Carnegie Mellon University, USA 2Tampere University of Technology, Finland

2.1 Introduction

In order to understand the techniques described later in this book, it is important to understand how automatic speech-recognition (ASR) systems function. This chapter briefly outlines the framework employed by ASR systems based on hidden Markov models (HMMs).

Most mainstream ASR systems are designed as probabilistic Bayes classifiers that identify the most likely word sequence that explains a given recorded acoustic signal. To do so, they use an estimate of the probabilities of possible word sequences in the language, and the probability distributions of the acoustic signals for each word sequence. Both the probability distributions of word sequences, and those of the acoustic signals for any word sequence, are represented through parametric models. Probabilities of word sequences are modeled by various forms of grammars or N-gram models. The probabilities of the acoustic signals are modeled by HMMs.

In the rest of this chapter, we will briefly describe the components and process of ASR as outlined above, as a prelude to explaining the circumstances under which it may perform poorly, and how that relates to the remaining chapters of this book. Since this book primarily addresses factors that affect the acoustic signal, we will only pay cursory attention to the manner in which word-sequence probabilities ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.