12Multichannel Clustering and Classification Approaches
Michael I. Mandel Shoko Araki and Tomohiro Nakatani
This chapter describes methods for estimating time‐frequency masks of source activity from multichannel observations using clustering and classification techniques. Such methods are similar to the speech presence probability (SPP) estimates in Chapter 11, but can be applied to any signal, not just speech, and can be applied in the presence of nonstationary noise, not just stationary noise. Clustering algorithms estimate time‐frequency masks by grouping together time‐frequency bins with similar characteristics. Classification algorithms estimate these masks based on a comparison of time‐frequency bins in the signal under analysis to those of previously seen training data. Because clustering algorithms only compare parts of the test signal to one another, they typically do not require training data. Classification algorithms, in contrast, are extremely dependent on the characteristics and quality of their training data. In the notation of Section 1.3.3, clustering is generally a learning‐free method, while classification is a separation‐based training method.
This chapter is also related to Chapter 14, which describes a complete generative model of the joint spatial and time‐frequency characteristics of multichannel signals that can be used to separate or enhance target signals of interest. The methods described in the current chapter, in contrast, focus on estimating only the ...
Get Audio Source Separation and Speech Enhancement now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.