In the previous chapters we provided a broad introduction to speech recognition methods, including training. However, there are a number of other methods for improving the statistical modeling of speech acoustics that have proved to be advantageous. In this chapter1, we will discuss two of the most important of these: adaptation, and common methods of discriminative training.


28.2.1 MAP and MLLR

We begin with a brief description of the adaptation problem which, for simplicity, we will frame in terms of speaker adaptation. There are many other goals for adaptation, for example channel adaptation, but the underlying principles are shared. We have at our disposal a baseline HMM that has been trained from a large corpus consisting of many (probably thousands of) hours of data collected from many (again probably thousands of) speakers. We think of these models as being speaker-independent and denote the model parameters ΘSI. We are given a small collection (possibly minutes or at most hours) of training frames, image from a single target speaker and we would like to produce speaker-dependent models, with model parameters ΘSD, that perform better than the speaker-independent models on the target speaker's test data. In adaptation, instead of training new models from scratch, we use ΘI and the frames to estimate ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.