Microphone Arrays

John McDonough1, Kenichi Kumatani2

1Carnegie Mellon University, USA 2Disney Research, USA

This contribution takes as its objective the class of techniques suitable for performing speech recognition, not on the signal capture by a single microphone, but on that obtained by combining the signals from several microphones. The techniques discussed here differ from those presented in Chapter 5 in that they are based on the pair of assumptions that:

1. The geometry of the array of microphones is fixed and known.
2. The position of the active speakers relative to the array are known or can be accurately estimated.

Such techniques—known collectively as beamforming—have been the subject of intense interest in recent years within the acoustic array processing research community. Unfortunately, such techniques have been largely ignored in the mainstream automatic speech-recognition field, although this may rapidly change given the recent release and widespread popularity of the Microsoft Kinect® platform. The simplest of beamforming algorithms, the delay-and-sum beamformer, uses only this geometric knowledge—that is the arrangement of the microphones and the speaker's position—to compensate for the time delays of the signals arriving at each sensor and then additively combine them. More sophisticated adaptive beamformers minimize the total output power of the array under the constraint that the desired source must be unattenuated. The conventional adaptive beamforming ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.