As noted in [9], one of the key measurements used in speech processing is the short-term spectrum. In all of its many forms, this measure consists of some kind of local spectral estimate, typically measured over a relatively short region of speech (e.g., 20 or 30 ms). This measure has been shown to be useful for a range of speech applications, including speech coding and recognition. In each case, the basic notion is that of capturing the time-varying spectral envelope for the speech, and in each case it is desirable to reduce the effects of pitch on this estimate; either pitch is used separately (as with a vocoder or a tone language speech-recognition system), or it is generally discarded as irrelevant to the discrimination (as in most English language speech-recognition systems). Therefore, in speech applications, the short-term spectral algorithm is usually designed to estimate a spectral envelope that has a reduced influence from the pitch harmonics in voiced speech.

In this chapter and the following two, we will describe three basic approaches to the estimation of the short-term spectral envelope: filter banks, cepstral processing, and linear predictive coding (LPC). The first and oldest approach is that of temporally smoothed power estimates from a bank of bandpass filters. Since much of the inspiration for ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.