7Single‐Channel Classification and Clustering Approaches

Felix Weninger Jun Du Erik Marchi and Tian Gao

The separation of sources from single‐channel mixtures is particularly challenging. If two or more microphones are available, information on relative amplitudes or relative time delays can be used to identify the sources and help to perform the separation (see Chapter 12). Yet, with only one microphone, this information is not available. Instead, information about the structure of the source signals must be exploited to identify and separate the different components.

Methods for single‐channel source separation can be roughly grouped into two categories: clustering and classification/regression. Clustering algorithms are based on grouping similar time‐frequency bins. This particularly includes computational auditory scene analysis (CASA) approaches, which rely on psychoacoustic cues in a learning‐free mode, i.e. no models of individual sources are assumed, but rather generic properties of acoustic signals are exploited. In contrast, classification and regression algorithms are used in separation‐based training to predict the source belonging to the target class or classify the type of source that dominates each time‐frequency bin. Factorial hidden Markov models (HMMs) are a generative model explaining the statistics of a mixture based on statistical models of individual source signals, and hence rely on source‐based unsupervised training, i.e. training a model for each source from ...

Get Audio Source Separation and Speech Enhancement now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.