Audio Source Separation and Speech Enhancement
by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot
Preface
Source separation and speech enhancement are some of the most studied technologies in audio signal processing. Their goal is to extract one or more source signals of interest from an audio recording involving several sound sources. This problem arises in many everyday situations. For instance, spoken communication is often obscured by concurrent speakers or by background noise, outdoor recordings feature a variety of environmental sounds, and most music recordings involve a group of instruments. When facing such scenes, humans are able to perceive and listen to individual sources so as to communicate with other speakers, navigate in a crowded street or memorize the melody of a song. Source separation and speech enhancement technologies aim to empower machines with similar abilities.
These technologies are already present in our lives today. Beyond “clean” single‐source signals recorded with close microphones, they allow the industry to extend the applicability of speech and audio processing systems to multi‐source, reverberant, noisy signals recorded with distant microphones. Some of the most striking examples include hearing aids, speech enhancement for smartphones, and distant‐microphone voice command systems. Current technologies are expected to keep improving and spread to many other scenarios in the next few years.
Traditionally, speech enhancement has referred to the problem of segregating speech and background noise, while source separation has referred to the segregation ...