Emmanuel Vincent Tuomas Virtanen and Sharon Gannot

Source separation and speech enhancement research has made dramatic progress in the last 30 years. It is now a mainstream topic in speech and audio processing, with hundreds of papers published every year. Separation and enhancement performance have greatly improved and successful commercial applications are increasingly being deployed. This chapter provides an overview of research and development perspectives in the field. We do not attempt to cover all perspectives currently under discussion in the community. Instead, we focus on five directions in which we believe major progress is still possible: getting the most out of deep learning, exploiting phase relationships across time‐frequency bins, improving the estimation accuracy of multichannel parameters, addressing scenarios involving multiple microphone arrays or other sensors, and accelerating industry transfer. These five directions are covered in Sections 19.1, 19.2, 19.3, 19.4, and 19.5, respectively.

19.1 Advancing Deep Learning

In just a few years, deep learning has emerged as a major paradigm for source separation and speech enhancement. Deep neural networks (DNNs) can model the complex characteristics of audio sources by making efficient use of large amounts (typically hours) of training data. They perform well on mixtures involving similar conditions to those in the training set and they are surprisingly robust to unseen conditions (Vincent et al., 2017 ...

Get Audio Source Separation and Speech Enhancement now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.