CHAPTER 28

ACOUSTIC MODEL TRAINING: FURTHER TOPICS

28.1 INTRODUCTION

In the previous chapters we provided a broad introduction to speech recognition methods, including training. However, there are a number of other methods for improving the statistical modeling of speech acoustics that have proved to be advantageous. In this chapter1, we will discuss two of the most important of these: adaptation, and common methods of discriminative training.

28.2 ADAPTATION

28.2.1 MAP and MLLR

We begin with a brief description of the adaptation problem which, for simplicity, we will frame in terms of speaker adaptation. There are many other goals for adaptation, for example channel adaptation, but the underlying principles are shared. We have at our disposal a baseline HMM that has been trained from a large corpus consisting of many (probably thousands of) hours of data collected from many (again probably thousands of) speakers. We think of these models as being speaker-independent and denote the model parameters ΘSI. We are given a small collection (possibly minutes or at most hours) of training frames, image from a single target speaker and we would like to produce speaker-dependent models, with model parameters ΘSD, that perform better than the speaker-independent models on the target speaker's test data. In adaptation, instead of training new models from scratch, we use ΘI and the frames to estimate ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.