Missing-Data Techniques: Recognition with Incomplete Spectrograms

Jon Barker

University of Sheffield, UK

14.1 Introduction

In Part Four of this book, the mismatch between the statistics of noisy observations and those of noise-free speech was presented as the fundamental problem facing robust ASR systems. Techniques were described that aimed to improve performance by reducing this mismatch. This section of the book takes a rather different perspective that emphasises information loss rather than model mismatch. The difference between these perspective can be illustrated by the visual analogy presented in Figure 14.1.

Figure 14.1 A visual analogy comparing two views of the robust ASR problem: noise as a source of model mismatch versus noise as a source of information loss. The distortion in the top panel is invertible and the original signal could theoretically be recovered if the model for the distortion was known. The occlusion in the bottom panel is not invertible and information has been genuinely lost.


The top panel of the figure shows a word written is a familiar font that has been partially distorted: the lower half of the word has been passed through a ripple effect. It is clear that the distorted image will be poorly matched to models that have been trained on undistorted characters. However, it is also clear that as long as the parameters of the distortion are known, ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.