Extraction of Speech from Mixture Signals

Paris Smaragdis

University of Illinois at Urbana-Champaign, USA

5.1 The Problem with Mixtures

Traditionally, signal-processing and pattern-recognition algorithms tend to look at signals under the assumption of little, if any, interference. The vast majority of algorithms for speech recognition, pitch detection, phonetic classification, etc., assume that the input is a relatively clean speech signal, potentially contaminated by a simple noise term such as additive Gaussian noise. The reason for that tendency is partially pedagogical; one should not only know how to treat a clean signal before moving to more complex cases but also a result of being limited in our abilities to mathematically analyze signals. Once we are confronted with mixture signals a lot of our signal processing intuition and mathematical foundations no longer apply directly, and there is little algorithmic basis to map well-defined operations on clean speech to cases of speech plus interference.

A practical way out of this problem is that of considering preprocessing steps that attempt to separate mixed signals and provide a reasonably clean version of the speech component. Once that is obtained, and under the assumption that the interference is reasonably well removed, one can perform operations which assume a clean speech input which we can now provide.

Historically the field of source separation has seen many approaches based on a varying range of schools of thought ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.