Skip to Content
Audio Source Separation and Speech Enhancement
book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot
October 2018
Intermediate to advanced
504 pages
18h 50m
English
Wiley
Content preview from Audio Source Separation and Speech Enhancement

12Multichannel Clustering and Classification Approaches

Michael I. Mandel Shoko Araki and Tomohiro Nakatani

This chapter describes methods for estimating time‐frequency masks of source activity from multichannel observations using clustering and classification techniques. Such methods are similar to the speech presence probability (SPP) estimates in Chapter 11, but can be applied to any signal, not just speech, and can be applied in the presence of nonstationary noise, not just stationary noise. Clustering algorithms estimate time‐frequency masks by grouping together time‐frequency bins with similar characteristics. Classification algorithms estimate these masks based on a comparison of time‐frequency bins in the signal under analysis to those of previously seen training data. Because clustering algorithms only compare parts of the test signal to one another, they typically do not require training data. Classification algorithms, in contrast, are extremely dependent on the characteristics and quality of their training data. In the notation of Section 1.3.3, clustering is generally a learning‐free method, while classification is a separation‐based training method.

This chapter is also related to Chapter 14, which describes a complete generative model of the joint spatial and time‐frequency characteristics of multichannel signals that can be used to separate or enhance target signals of interest. The methods described in the current chapter, in contrast, focus on estimating only the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Techniques for Noise Robustness in Automatic Speech Recognition

Rita Singh, Tuomas Virtanen, Bhiksha Raj
Parametric Time-Frequency Domain Spatial Audio

Parametric Time-Frequency Domain Spatial Audio

Ville Pulkki, Symeon Delikaris-Manias, Archontis Politis

Publisher Resources

ISBN: 9781119279891Purchase book