Skip to Content
Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition
book

Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition

by Ben Gold, Nelson Morgan, Dan Ellis
August 2011
Beginner to intermediate
688 pages
21h 28m
English
Wiley-Interscience
Content preview from Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition

CHAPTER 24

image

DETERMINISTIC SEQ UENC RECOGNITION FOR ASR

24.1 INTRODUCTION

In the past few chapters, we have established the basics for understanding the static pattern-classification aspect of speech recognition.

  1. Signal representation: in most ASR systems, some function of the local short-term spectrum is used. Typically, this consists of cepstral parameters corresponding to a smoothed spectrum. These parameters are computed every 10 ms or so from a Hamming-windowed speech segment that is 20–30 ms in length. Each of these temporal steps is referred to as a frame.
  2. Classes: in most current systems, the categories that are associated with the short-term signal spectra are phones or subphones,1 as noted in Chapter 23. In some systems, though, the classes simply consist of implicit categories associated with the training data.

Given these choices, one can use any of the techniques described in Chapter 8 to train deterministic classifiers (e.g., minimum distance, linear discriminant functions, neural networks, etc.) that can classify signal segments into one of the classes. However, as noted earlier, speech recognition includes both pattern classification and sequence recognition; recognition of a string of linguistic units from the sequence of segment spectra requires finding the best match overall, not just locally. This would not be so much of a problem if the local match was always ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Audio Processes

Audio Processes

David Creasey
Audio Source Separation and Speech Enhancement

Audio Source Separation and Speech Enhancement

Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

Publisher Resources

ISBN: 9780470195369Purchase book