Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition
by Ben Gold, Nelson Morgan, Dan Ellis
PART VI
![]()
AUTOMATIC SPEECH RECOGNITION

I always heard it couldn't be done, but sometimes it don't always work.
–Casey Stengel
BUILDING ON both the mathematical techniques of Part II and the feature extraction methods of Part V, we now focus on a single important application area: speech recognition. We describe a major aspect of speech-recognition systems in each of the eight chapters in Part VI. In Chapter 22, we extend the archetypal processing paradigms of Part V to the style of signal-processing features that are most commonly used for 1998 ASR systems. Chapter 23 introduces the linguistic categories that are most frequently used in such systems, such as phones and phonemes. The next two chapters describe methods for determining a sequence of words from measures of the similarity or dissimilarity between training examples of words and new test data. In both cases, the primary technique described is a search method known as dynamic programming. In Chapter 24, distance between sounds in training versus test examples is used as the measure of dissimilarity. Although this approach per se is rarely used today, its description can often be a useful introduction to more advanced techniques. In Chapter 25, the statistical generalization of this approach is developed, and its relation ...