Skip to Content
Techniques for Noise Robustness in Automatic Speech Recognition
book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj
November 2012
Intermediate to advanced
514 pages
17h 40m
English
Wiley
Content preview from Techniques for Noise Robustness in Automatic Speech Recognition

14

Missing-Data Techniques: Recognition with Incomplete Spectrograms

Jon Barker

University of Sheffield, UK

14.1 Introduction

In Part Four of this book, the mismatch between the statistics of noisy observations and those of noise-free speech was presented as the fundamental problem facing robust ASR systems. Techniques were described that aimed to improve performance by reducing this mismatch. This section of the book takes a rather different perspective that emphasises information loss rather than model mismatch. The difference between these perspective can be illustrated by the visual analogy presented in Figure 14.1.

Figure 14.1 A visual analogy comparing two views of the robust ASR problem: noise as a source of model mismatch versus noise as a source of information loss. The distortion in the top panel is invertible and the original signal could theoretically be recovered if the model for the distortion was known. The occlusion in the bottom panel is not invertible and information has been genuinely lost.

ch14fig001.eps

The top panel of the figure shows a word written is a familiar font that has been partially distorted: the lower half of the word has been passed through a ripple effect. It is clear that the distorted image will be poorly matched to models that have been trained on undistorted characters. However, it is also clear that as long as the parameters of the distortion are known, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Audio Source Separation and Speech Enhancement

Audio Source Separation and Speech Enhancement

Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot
Parametric Time-Frequency Domain Spatial Audio

Parametric Time-Frequency Domain Spatial Audio

Ville Pulkki, Symeon Delikaris-Manias, Archontis Politis
Robust Automatic Speech Recognition

Robust Automatic Speech Recognition

Jinyu Li, Li Deng, Reinhold Haeb-Umbach, Yifan Gong

Publisher Resources

ISBN: 9781118392669Purchase book