Joonas Nikunen and Tuomas Virtanen
Department of Signal Processing, Tampere University of Technology, Finland
Natural sound scenes often consist of multiple sound sources at different spatial locations. When such a scene is captured with a microphone, a signal with a mixture of the sources is obtained, and the application of signal processing operations to signals from individual sources within the mixture is difficult. Processing of spatial audio through source and object separation allows altering the rendition of audio, such as changing the spatial position of sound sources. Alternatively, in many applications the separation of sources from the mixture is of great interest itself, that is, separating essential content (performing artist, speech, and so forth) from interfering sources in the recording scenario. These applications include assisted listening, robust content analysis of audio, modification of the sound scene for augmented reality, and three-dimensional audio in general. Spectrogram factorization aided with spatial analysis abilities can be used to parameterize spatial audio to achieve these tasks of modifying the audio content by sound objects.
Audio signals consist of sound events that repeat over time, such as individual phonemes in speech and notes from musical instruments. The magnitude spectrogram, that is, the absolute value of the short-time Fourier ...