3

The Problem of Robustness in Automatic Speech Recognition

Bhiksha Raj1, Tuomas Virtanen2, Rita Singh1

1Carnegie Mellon University, USA 2Tampere University of Technology, Finland

This chapter deals primarily not with what makes automatic speech-recognition systems (ASRs) work, but with some of the factors that make them go wrong. As mentioned earlier in Section 1.1, ASR systems often make errors in conditions in which a human listener could continue to hold a conversation effortlessly. Most real-life situations where people converse with one another or with an automated system are fraught with acoustic adversity. The speech that is finally heard may be distorted by a variety of external influences, not related to what was spoken, which affect its characteristics. While humans are not affected by them, ASR systems can be highly sensitive to these distortions. In other words, ASR systems are not robust to distortions in the speech signal in the manner that humans are. In this chapter, we discuss some of the reasons for this lack of robustness.

We recall that the problem of automatic speech recognition is fundamentally one of Bayesian classification. Recognition errors in ASR systems are a consequence of misclassification. Therefore, we begin by briefly discussing the rationale behind Bayesian classification and the conditions under which it can perform poorly. Later in the chapter, we relate these to the causes for errors in ASR, describe the various types of distortions that affect ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.