Feature Compensation

Jasha Droppo

Microsoft Research, USA

9.1 Life in an Ideal World

People convey linguistic messages by generating acoustic speech signals. In an ideal world, we could record that signal and derive acoustic features that contain all of the necessary information to achieve perfect recognition accuracy, and nothing else.

In our world, the acoustic features are computed from acoustic signals recorded by a microphone, and the information we need is obscured by noise and other irrelevant variabilities. To make matters worse, these features often suffer from linear and nonlinear channel effects, reverberation, and a significant amount of additive noise. Even in the absence of these distortions, the speech portion of the signal itself contains more information than what was said, including how it was said and who said it.

Figure 9.1 shows the connection between the ideal speech features that we want, the clean speech features that we may be able to get by carefully controlling the environmental conditions at the time of capture, and the noisy speech that we must often tolerate.

Figure 9.1 The goal of feature compensation is to recover more ideal speech features from observed noisy speech features.


This chapter focuses on feature-enhancement techniques, which strive to remove extraneous information and distortion from a sequence of speech-recognition features, while ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.