14Gaussian Model Based Multichannel Separation
Alexey Ozerov and Hirokazu Kameoka
The Gaussian framework for multichannel source separation consists of modeling vectors of STFT coefficients as multivariate complex Gaussian distributions. It allows specifying spatial and spectral models of the source spatial images and estimating their parameters in a joint manner. Multichannel nonnegative matrix factorization, illustrated in Figure 14.1, is one of the most popular such methods. It combines nonnegative matrix factorization (NMF) (see Chapter 8) and narrowband spatial modeling (see Chapter 3). Besides NMF, the Gaussian framework makes it possible to reuse many other single‐channel spectral models in a multichannel scenario. It differs from the frameworks in Chapters 11, 12, and 13 in the fact that more advanced generative spectral models are typically used. Also, according to the general taxonomies introduced in Chapter 1, it covers a wide range of audio source separation scenarios, including over‐ or underdetermined mixtures and weakly or strongly guided separation, and a wide range of methods that are either learning‐free or based on unsupervised/supervised source modeling.
In Section 14.1 we introduce the multichannel Gaussian framework. In Section 14.2 we provide a detailed list of spectral and spatial models. We explain how to estimate the parameters of these models in Section 14.3. We give a detailed presentation of a few methods in Section 14.4 and provide a summary in Section ...
Get Audio Source Separation and Speech Enhancement now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.