CHAPTER 22

image

FEATURE EXTRACTION FOR ASR

22.1 INTRODUCTION

In previous chapters, we have introduced some general classes of feature extraction that researchers and system developers have found useful for the representation of speech. Filter banks, cepstral analysis, and LPC are indeed the generic representations of choice for a range of applications in speech and audio processing. However, for each application area, there are specific representations that have been developed, and they often have some of the characteristics of more than one of these archetypes.

For current ASR systems, the goal has generally been to find a representation that is relatively stable for different examples of the same speech sound, despite differences in the speaker or environmental characteristics. In this chapter, we briefly discuss a few of the common approaches. For most of these, the representation will be computed roughly once every 10 ms over a window of 20 or 30 ms. We also briefly describe some of the common techniques used to further process the feature vectors in order to make the overall system robust to simple linear distortions of the input signal (that is, to produce the same recognition results despite these deviations). Finally, we briefly discuss a few of the many research approaches that are being explored in the area of improved feature extraction.

22.2 COMMON FEATURE VECTORS

Over ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.