As noted in [9], one of the key measurements used in speech processing is the short-term spectrum. In all of its many forms, this measure consists of some kind of local spectral estimate, typically measured over a relatively short region of speech (e.g., 20 or 30 ms). This measure has been shown to be useful for a range of speech applications, including speech coding and recognition. In each case, the basic notion is that of capturing the time-varying spectral envelope for the speech, and in each case it is desirable to reduce the effects of pitch on this estimate; either pitch is used separately (as with a vocoder or a tone language speech-recognition system), or it is generally discarded as irrelevant to the discrimination (as in most English language speech-recognition systems). Therefore, in speech applications, the short-term spectral algorithm is usually designed to estimate a spectral envelope that has a reduced influence from the pitch harmonics in voiced speech.

In this chapter and the following two, we will describe three basic approaches to the estimation of the short-term spectral envelope: filter banks, cepstral processing, and linear predictive coding (LPC). The first and oldest approach is that of temporally smoothed power estimates from a bank of bandpass filters. Since much of the inspiration for ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.