Chapter 5. Language Modeling

Katrin Kirchhoff

5.1. Introduction

Many applications in human language technology involve the use of a statistical language model—a model that specifies the a priori probability of a particular word sequence in the language of interest. Given an alphabet or inventory of units Σ and a sequence W = w1w2 ...wt Σ, a language model can be used to compute the probability of W based on parameters previously estimated from a training set. Most commonly, the inventory Σ (also called vocabulary) is the list of unique words encountered in the training data; however, as we will see in this chapter, selecting the units over which a language model should be defined can be a rather difficult problem, particularly in languages ...

