Skip to Content
Techniques for Noise Robustness in Automatic Speech Recognition
book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj
November 2012
Intermediate to advanced
514 pages
17h 40m
English
Wiley
Content preview from Techniques for Noise Robustness in Automatic Speech Recognition

3

The Problem of Robustness in Automatic Speech Recognition

Bhiksha Raj1, Tuomas Virtanen2, Rita Singh1

1Carnegie Mellon University, USA 2Tampere University of Technology, Finland

This chapter deals primarily not with what makes automatic speech-recognition systems (ASRs) work, but with some of the factors that make them go wrong. As mentioned earlier in Section 1.1, ASR systems often make errors in conditions in which a human listener could continue to hold a conversation effortlessly. Most real-life situations where people converse with one another or with an automated system are fraught with acoustic adversity. The speech that is finally heard may be distorted by a variety of external influences, not related to what was spoken, which affect its characteristics. While humans are not affected by them, ASR systems can be highly sensitive to these distortions. In other words, ASR systems are not robust to distortions in the speech signal in the manner that humans are. In this chapter, we discuss some of the reasons for this lack of robustness.

We recall that the problem of automatic speech recognition is fundamentally one of Bayesian classification. Recognition errors in ASR systems are a consequence of misclassification. Therefore, we begin by briefly discussing the rationale behind Bayesian classification and the conditions under which it can perform poorly. Later in the chapter, we relate these to the causes for errors in ASR, describe the various types of distortions that affect ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Audio Source Separation and Speech Enhancement

Audio Source Separation and Speech Enhancement

Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot
Parametric Time-Frequency Domain Spatial Audio

Parametric Time-Frequency Domain Spatial Audio

Ville Pulkki, Symeon Delikaris-Manias, Archontis Politis
Robust Automatic Speech Recognition

Robust Automatic Speech Recognition

Jinyu Li, Li Deng, Reinhold Haeb-Umbach, Yifan Gong

Publisher Resources

ISBN: 9781118392669Purchase book