Missing-Data Techniques: Feature Reconstruction

Jort Florent Gemmeke1, Ulpu Remes2

1KU Leuven, Belgium 2Aalto University School of Science, Finland

15.1 Introduction

Automatic speech recognition (ASR) performance degrades rapidly when speech is corrupted with increasing levels of noise. Missing-data techniques are a family of methods which tackle noise-robust speech recognition based on the so-called missing-data assumption proposed in [12]. The methods assume that (i) the noisy speech signal can be divided in speech-dominated (reliable) and noise-dominated (unreliable) spectro-temporal components prior to decoding and (ii) the unreliable elements do not retain any information about the corresponding clean speech values. This means that the clean speech values corresponding to the noise-dominated components are effectively missing, and speech recognition must proceed with partially observed data.

Techniques for speech recognition with missing features divide in roughly two categories, marginalization and feature reconstruction. The marginalization approach, discussed in Chapter 14, is based on disregarding the missing components when calculating acoustic model likelihoods: The likelihoods that correspond to the missing components are calculated by integrating over the full range of possible missing-feature values [11]. In this chapter, we focus on the reconstruction approach, where the missing values are substituted (imputed) with clean speech estimates prior to calculating ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.