LINGUISTIC CATEGORIES FOR SPEECH RECOGNITION
In the past few chapters, we have introduced the fundamentals of feature extraction for ASR. The resulting features are gathered into feature vectors that are associated with linguistic categories during training, and then during recognition they are integrated over time to find the best linguistic sequence to assign to the observed sequence of feature vectors.
Here,1 we discuss the linguistic categories that have been used or proposed for use in ASR. Many of these have been alluded to previously (articulatory features, phones, phonemes, syllables, words, phrases, sentences, etc.); here we attempt to define terms more rigorously, particularly with regard to their application in speech engineering. We also highlight some of the research areas relating to the representation of these linguistic categories in ASR, particularly in the context of fluent speech.
23.2 PHONES AND PHONEMES
Words are a natural unit for modeling in ASR, particularly since there are many applications for which isolated words are an adequate form of input. Even for continuous speech, using complete words as the fundamental linguistic unit permits acoustic modeling of the word-specific context of the sounds used. Consider, for example, a digit recognition task (e.g. for credit card numbers): the system must recognize strings ...