book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

November 2012

Intermediate to advanced

514 pages

17h 40m

English

Wiley

Read now

Unlock full access

Cover
Title Page
Copyright
List of Contributors
Acknowledgments
Chapter 1: Introduction
1.1 Scope of the Book1.2 Outline1.3 Notation
Part One: Foundations
Chapter 2: The Basics of Automatic Speech Recognition
2.1 Introduction2.2 Speech Recognition Viewed as Bayes Classification2.3 Hidden Markov Models2.4 HMM-Based Speech Recognition
Chapter 3: The Problem of Robustness in Automatic Speech Recognition
3.1 Errors in Bayes Classification3.2 Bayes Classification and ASR3.3 External Influences on Speech Recordings3.4 The Effect of External Influences on Recognition3.5 Improving Recognition under Adverse Conditions
Part Two: Signal Enhancement

Chapter 4: Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement
4.1 Introduction4.2 Signal Analysis and Synthesis4.3 Voice Activity Detection4.4 Noise Power Spectrum Estimation4.5 Adaptive Filters for Signal Enhancement4.6 ASR Performance4.7 Conclusions
Chapter 5: Extraction of Speech from Mixture Signals
5.1 The Problem with Mixtures5.2 Multichannel Mixtures5.3 Single-Channel Mixtures5.4 Variations and Extensions5.5 Conclusions
Chapter 6: Microphone Arrays
6.1 Speaker Tracking6.2 Conventional Microphone Arrays6.3 Conventional Adaptive Beamforming Algorithms6.4 Spherical Microphone Arrays6.5 Spherical Adaptive Algorithms6.6 Comparative Studies6.7 Comparison of Linear and Spherical Arrays for DSR6.8 Conclusions and Further Reading
Part Three: Feature Enhancement
Chapter 7: From Signals to Speech Features by Digital Signal Processing
7.1 Introduction7.2 The Speech Signal7.3 Spectral Processing7.4 Cepstral Processing7.5 Influence of Distortions on Different Speech Features7.6 Summary and Further Reading
Chapter 8: Features Based on Auditory Physiology and Perception
8.1 Introduction8.2 Some Attributes of Auditory Physiology and Perception8.3 “Classic” Auditory Representations8.4 Current Trends in Auditory Feature Analysis8.5 SummaryAcknowledgments
Chapter 9: Feature Compensation
9.1 Life in an Ideal World9.2 MMSE-SPLICE9.3 Discriminative SPLICE9.4 Model-Based Feature Enhancement9.5 Switching Linear Dynamic System9.6 Conclusion
Chapter 10: Reverberant Speech Recognition
10.1 Introduction10.2 The Effect of Reverberation10.3 Approaches to Reverberant Speech Recognition10.4 Feature Domain Model of the Acoustic Impulse Response10.5 Bayesian Feature Enhancement10.6 Experimental Results10.7 ConclusionsAcknowledgment
Part Four: Model Enhancement
Chapter 11: Adaptation and Discriminative Training of Acoustic Models
11.1 Introduction11.2 Acoustic Model Adaptation and Noise Robustness11.3 Maximum A Posteriori Reestimation11.4 Maximum Likelihood Linear Regression11.5 Discriminative Training11.6 Conclusion
Chapter 12: Factorial Models for Noise Robust Speech Recognition
12.1 Introduction12.2 The Model-Based Approach12.3 Signal Feature Domains12.4 Interaction Models12.5 Inference Methods12.6 Efficient Likelihood Evaluation in Factorial Models12.7 Current Directions
Chapter 13: Acoustic Model Training for Robust Speech Recognition
13.1 Introduction13.2 Traditional Training Methods for Robust Speech Recognition13.3 A Brief Overview of Speaker Adaptive Training13.4 Feature-Space Noise Adaptive Training13.5 Model-Space Noise Adaptive Training13.6 Noise Adaptive Training using VTS Adaptation13.7 Discussion13.8 Conclusion
Part Five: Compensation for Information Loss
Chapter 14: Missing-Data Techniques: Recognition with Incomplete Spectrograms
14.1 Introduction14.2 Classification with Incomplete Data14.3 Energetic Masking14.4 Meta-Missing Data: Dealing with Mask Uncertainty14.5 Some Perspectives on Performance
Chapter 15: Missing-Data Techniques: Feature Reconstruction
15.1 Introduction15.2 Missing-Data Techniques15.3 Correlation-Based Imputation15.4 Cluster-Based Imputation15.5 Class-Conditioned Imputation15.6 Sparse Imputation15.7 Other Feature-Reconstruction Methods15.8 Experimental Results15.9 Discussion and ConclusionAcknowledgments
Chapter 16: Computational Auditory Scene Analysis and Automatic Speech Recognition
16.1 Introduction16.2 Auditory Scene Analysis16.3 Computational Auditory Scene Analysis16.4 CASA Strategies16.5 Integrating CASA with ASR16.6 Concluding RemarksAcknowledgment
Chapter 17: Uncertainty Decoding
17.1 Introduction17.2 Observation Uncertainty17.3 Uncertainty Decoding17.4 Feature-Based Uncertainty Decoding17.5 Model-Based Joint Uncertainty Decoding17.6 Noisy CMLLR17.7 Uncertainty and Adaptive Training17.8 In Combination with Other Techniques17.9 Conclusions
Index

Overview

Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.

Key features:

Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.
Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.
Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.
Includes contributions from top ASR researchers from leading research units in the field

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Parametric Time-Frequency Domain Spatial Audio

Publisher Resources

ISBN: 9781118392669Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills