book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

November 2012

Intermediate to advanced

514 pages

17h 40m

English

Wiley

Read now

Unlock full access

1.1 Scope of the Book1.2 Outline1.3 Notation
2.1 Introduction2.2 Speech Recognition Viewed as Bayes Classification2.3 Hidden Markov Models2.4 HMM-Based Speech Recognition
3.1 Errors in Bayes Classification3.2 Bayes Classification and ASR3.3 External Influences on Speech Recordings3.4 The Effect of External Influences on Recognition3.5 Improving Recognition under Adverse Conditions

4.1 Introduction4.2 Signal Analysis and Synthesis4.3 Voice Activity Detection4.4 Noise Power Spectrum Estimation4.5 Adaptive Filters for Signal Enhancement4.6 ASR Performance4.7 Conclusions
5.1 The Problem with Mixtures5.2 Multichannel Mixtures5.3 Single-Channel Mixtures5.4 Variations and Extensions5.5 Conclusions
6.1 Speaker Tracking6.2 Conventional Microphone Arrays6.3 Conventional Adaptive Beamforming Algorithms6.4 Spherical Microphone Arrays6.5 Spherical Adaptive Algorithms6.6 Comparative Studies6.7 Comparison of Linear and Spherical Arrays for DSR6.8 Conclusions and Further Reading
7.1 Introduction7.2 The Speech Signal7.3 Spectral Processing7.4 Cepstral Processing7.5 Influence of Distortions on Different Speech Features7.6 Summary and Further Reading
8.1 Introduction8.2 Some Attributes of Auditory Physiology and Perception8.3 “Classic” Auditory Representations8.4 Current Trends in Auditory Feature Analysis8.5 SummaryAcknowledgments
9.1 Life in an Ideal World9.2 MMSE-SPLICE9.3 Discriminative SPLICE9.4 Model-Based Feature Enhancement9.5 Switching Linear Dynamic System9.6 Conclusion
10.1 Introduction10.2 The Effect of Reverberation10.3 Approaches to Reverberant Speech Recognition10.4 Feature Domain Model of the Acoustic Impulse Response10.5 Bayesian Feature Enhancement10.6 Experimental Results10.7 ConclusionsAcknowledgment
11.1 Introduction11.2 Acoustic Model Adaptation and Noise Robustness11.3 Maximum A Posteriori Reestimation11.4 Maximum Likelihood Linear Regression11.5 Discriminative Training11.6 Conclusion
12.1 Introduction12.2 The Model-Based Approach12.3 Signal Feature Domains12.4 Interaction Models12.5 Inference Methods12.6 Efficient Likelihood Evaluation in Factorial Models12.7 Current Directions
13.1 Introduction13.2 Traditional Training Methods for Robust Speech Recognition13.3 A Brief Overview of Speaker Adaptive Training13.4 Feature-Space Noise Adaptive Training13.5 Model-Space Noise Adaptive Training13.6 Noise Adaptive Training using VTS Adaptation13.7 Discussion13.8 Conclusion
14.1 Introduction14.2 Classification with Incomplete Data14.3 Energetic Masking14.4 Meta-Missing Data: Dealing with Mask Uncertainty14.5 Some Perspectives on Performance
15.1 Introduction15.2 Missing-Data Techniques15.3 Correlation-Based Imputation15.4 Cluster-Based Imputation15.5 Class-Conditioned Imputation15.6 Sparse Imputation15.7 Other Feature-Reconstruction Methods15.8 Experimental Results15.9 Discussion and ConclusionAcknowledgments
16.1 Introduction16.2 Auditory Scene Analysis16.3 Computational Auditory Scene Analysis16.4 CASA Strategies16.5 Integrating CASA with ASR16.6 Concluding RemarksAcknowledgment
17.1 Introduction17.2 Observation Uncertainty17.3 Uncertainty Decoding17.4 Feature-Based Uncertainty Decoding17.5 Model-Based Joint Uncertainty Decoding17.6 Noisy CMLLR17.7 Uncertainty and Adaptive Training17.8 In Combination with Other Techniques17.9 Conclusions

Content preview from Techniques for Noise Robustness in Automatic Speech Recognition

Reverberant Speech Recognition

Reinhold Haeb-Umbach, Alexander Krueger

University of Paderborn, Germany

10.1 Introduction

From a usage point of view, there are a number of reasons why in many applications of automatic speech recognition (ASR) distant talking microphones are to be preferred over close-talking microphones. The first is convenience: freeing the user from holding a microphone or wearing a headset increases the ease of use, and thus raises the acceptance of appliances or services operated by voice commands. A second reason is safety: there are numerous applications, where the hands are needed for more important tasks than for holding a microphone to capture the user's speech. Examples include the hands-free control of a cellular phone or a car navigation system while driving, or the control of some apparatus by a surgeon while being busy with an operation. Finally, moving the microphone away from the mouth of the speaker is in line with the disappearing computer and the ambient intelligence paradigm, which has been put forward already for several years [1]. It describes the vision of technology that is invisible, embedded in our surroundings while still being present whenever we need it. Interacting with it should be simple and effortless, and speech, as a “remote control” that a user has with him all the time, is the ideal means of interaction.

However, increasing the distance between the speaker and the microphone has dramatic consequences on the quality of the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Audio Source Separation and Speech Enhancement

Publisher Resources

ISBN: 9781118392669Purchase book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.