This contribution takes as its objective the class of techniques suitable for performing speech recognition, not on the signal capture by a single microphone, but on that obtained by combining the signals from several microphones. The techniques discussed here differ from those presented in Chapter 5 in that they are based on the pair of assumptions that:
Such techniques—known collectively as beamforming—have been the subject of intense interest in recent years within the acoustic array processing research community. Unfortunately, such techniques have been largely ignored in the mainstream automatic speech-recognition field, although this may rapidly change given the recent release and widespread popularity of the Microsoft Kinect® platform. The simplest of beamforming algorithms, the delay-and-sum beamformer, uses only this geometric knowledge—that is the arrangement of the microphones and the speaker's position—to compensate for the time delays of the signals arriving at each sensor and then additively combine them. More sophisticated adaptive beamformers minimize the total output power of the array under the constraint that the desired source must be unattenuated. The conventional adaptive beamforming ...