12 Microphone-Array-Based Speech Enhancement Using Neural Networks
Pasi Pertilä
Department of Signal Processing, Tampere University of Technology, Finland
12.1 Introduction
As discussed Chapter 10, the noise reduction capacity of beamforming can in practice be rather modest, and the use of post-filtering is often called for to further reduce the noise and interference in the beamformer’s output by using time–frequency (TF) masking. The Wiener filter is theoretically an optimal method (in the mean squared error sense) for noise suppression, but it requires the noise power spectrum (or that of the target signal) to be available during operation. This is problematic in typical real-world scenarios, where only the noisy target signal is observed and no explicit noise (or target) signal is available. A traditional speech enhancement approach is to update the estimates of the noise parameters during silence periods of speech. In environments where the noise statistics do not change significantly until the next update is available, this approach can achieve good noise suppression. Different variants of this technique have been developed in the past (see, for example, Diethorn, 2004). However, relying on a voice activity detection scheme inherently increases the system’s complexity and decreases its robustness. Furthermore, real-world noise is often dynamic, which violates the assumption of noise stationarity. The errors made in the parameter estimates required by the approach ultimately ...
Get Parametric Time-Frequency Domain Spatial Audio now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.