17Application of Source Separation to Robust Speech Analysis and Recognition

Shinji Watanabe Tuomas Virtanen and Dorothea Kolossa

This chapter describes applications of source separation techniques to robust speech analysis and recognition, including automatic speech recognition (ASR), speaker/language identification, emotion and paralinguistic analysis, and audiovisual analysis. These are the most successful applications in audio and speech processing, with various commercial products including Google Voice Search, Apple Siri, Amazon Echo, and Microsoft Cortana. Robustness against noise or nontarget speech still remains a challenging issue, and source separation and speech enhancement techniques are gathering much attention in the speech community.

This chapter systematically describes how source separation and speech enhancement techniques are applied to improve the robustness of these applications. It first describes the challenges and opportunities in Section 17.1, and defines the considered speech analysis and recognition applications with basic formulations in Section 17.2. Section 17.3 describes the current state‐of‐the‐art system using source separation as a front‐end method for speech analysis and recognition. Section 17.4 introduces a way of tightly integrating these methods by preserving the uncertainties between them. Section 17.5 provides another possible solution to the robustness issues with the help of cross‐modality information. Section 17.6 concludes the chapter. ...

Get Audio Source Separation and Speech Enhancement now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.