4Multichannel Source Activity Detection, Localization, and Tracking

Pasi Pertilä Alessio Brutti Piergiorgio Svaizer and Maurizio Omologo

In the previous chapters, we have seen how both spectral and spatial properties of sound sources are relevant in describing an acoustic scene picked up by multiple microphones distributed in a real environment. This chapter now provides an introduction to the most common problems and methods related to source activity detection, localization, and tracking based on a multichannel acquisition setup. In Section 4.1 we start with a brief overview and with the definition of some basic notions, in particular related to time difference of arrival (TDOA) estimation and to the so‐called acoustic maps. In Section 4.2, the activity detection problem will be addressed, starting from an overview of the most common methods and concluding with recent trends. Section 4.3 will examine the localization problem for both static and moving sources, with some insights into the localization of multiple sources. Section 4.4 will conclude the chapter.

Note that, although activity detection and localization apply to any kind of audio source, often the signal of interest is speech. Due to its spectral and temporal peculiarities, speech calls for specific algorithmic solutions which typically do not generalize to other sources. As an example, speech is by definition a nonstationary process, characterized by both long and very short pauses. Therefore, especially in Section 4.2 ...

