Adaptation and Discriminative Training of Acoustic Models

Yannick Estève, Paul Deléglise

University of Le Mans, France

11.1 Introduction

The main weakness of automatic speech-recognition (ASR) systems resides in their lack of robustness to variability. All the knowledge bases used in an ASR system are affected by this problem: the dictionary – that is the list of the words recognizable by the system, along with their pronunciation variants – the language models as well as the acoustic models. Those knowledge bases – most particularly language and acoustic models, of probabilistic essence – are very dependent on the data used to estimate their various parameters. The problem posed by this dependence of probabilistic models on their training corpora is made more significant by the high cost of building such corpora. As a result of that cost, in practice, it is common for probabilistic models to be used in application contexts that differ considerably from the context of their training data.

Such mismatch between training data and application context causes the models to lose some of their precision and predictive power, in turn degrading the quality of speech recognition. This is a well-known problem, which has led to the development of many techniques aiming at lessening its impact. Model adaptation consists in reducing the mismatch between probabilistic models and the data against which they are used.

Noise is a cause of mismatch: it constitutes a variable phenomenon with potentially ...

Get Techniques for Noise Robustness in Automatic Speech Recognition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.