Skip to Content
Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition
book

Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition

by Ben Gold, Nelson Morgan, Dan Ellis
August 2011
Beginner to intermediate
688 pages
21h 28m
English
Wiley-Interscience
Content preview from Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition

CHAPTER 42

image

SPEAKER DIARIZATION

42.1 INTRODUCTION

As discussed in Chapter 8, for some applications it is useful to develop a classifier even without any labels, the so-called ‘unsupervised’ clustering task. For time series data, it is often useful to both segment and cluster the segments, for instance to associate each time segment with a particular source, even if that source is unknown. In the case of speech, this operation is known as speaker diarization, namely, the determination of who spoke when [25]. In its typical instantiation, there are no pre-existing models for any of the speakers; models are learned on the fly, with no supervisory information. No information about the underlying language, spoken text, amount of speech, number of speakers, or the placement of microphones need be given. As with nearly all modern speech applications, the dominant underlying model is a statistical one; and as in speaker verification, the basic representation is a Gaussian mixture model for each speaker, as described in Chapter 41. However, also like speaker verification, state-of-the-art implementations are relatively complex. In this chapter we1 will present the major methods in current use.

Unlike verification, speaker diarization does not require the recognition of particular speakers i.e., labeling speech with real names. It does, however, have its own challenges. In particular, diarization ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Audio Processes

Audio Processes

David Creasey
Audio Source Separation and Speech Enhancement

Audio Source Separation and Speech Enhancement

Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

Publisher Resources

ISBN: 9780470195369Purchase book