Skip to Content
Techniques for Noise Robustness in Automatic Speech Recognition
book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj
November 2012
Intermediate to advanced
514 pages
17h 40m
English
Wiley
Content preview from Techniques for Noise Robustness in Automatic Speech Recognition

16

Computational Auditory Scene Analysis and Automatic Speech Recognition

Arun Narayanan, DeLiang Wang

The Ohio State University, USA

16.1 Introduction

The human auditory system is, in a way, an engineering marvel. It is able to do wonderful things that powerful modern machines find extremely difficult. For instance, our auditory system is able to follow the lyrics of a song when the input is a mixture of speech and musical accompaniments. Another example is a party situation. Usually there are multiple groups of people talking, with laughter, ambient music and other sound sources running in the background. The input our auditory system receives through the ears is a mixture of all these. In spite of such a complex input, we are able to selectively listen to an individual speaker, attend to the music in the background, and so on. In fact this ability of ‘segregation’ is so instinctive that we take it for granted without wondering about the complexity of the problem our auditory system solves.

Colin Cherry, in the 1950s, coined the term ‘cocktail party problem’ while trying to describe how our auditory system functions in such an environment [12]. He did a series of experiments to study the factors that help humans perform this complex task [11]. A number of theories have been proposed since then to explain the observations made in those experiments [11,12,70]. Helmhotz had, in the mid-nineteenth century, reflected upon the complexity of this signal by using the example of a ball ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Audio Source Separation and Speech Enhancement

Audio Source Separation and Speech Enhancement

Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot
Parametric Time-Frequency Domain Spatial Audio

Parametric Time-Frequency Domain Spatial Audio

Ville Pulkki, Symeon Delikaris-Manias, Archontis Politis
Robust Automatic Speech Recognition

Robust Automatic Speech Recognition

Jinyu Li, Li Deng, Reinhold Haeb-Umbach, Yifan Gong

Publisher Resources

ISBN: 9781118392669Purchase book