may need to be explicitly represented in the dictionary for improved
recognition rates. This is true for both native and non-native variation.
Since the adaptation of the pronunciation dictionary, henceforth called
pronunciation adaptation, plays an important role in the case of non-
native speakers, we will discuss this problem in more detail in Section 9.5.2.
Speaker adaptation as well as pronunciation adaptation are likely to
improve recognition rates. Furthermore, it can be expected that these
improvements are additive if both methods are combined; the performance
of this combination is reviewed in Section 9.6. In Section 9.7 we summa-
rize some results on the performance of these techniques for cross-dialect
recognition for native speech.
Finally, Section 9.9 discusses other factors relevant to the development
of multilingual speech-recognition systems. We show that cultural factors
can have a significant impact on user interface design, and then discuss the
specific challenges of the developing world.
9.2 Characteristics of Non-native Speech
The differences between native and non-native speech can be quantified
in a variety of ways, all relevant to the problem of improving recognition
for non-native speakers. Differences in articulation, speaking rate, and
pause distribution can affect acoustic modeling, which looks for patterns
in phone pronunciation and duration and cross-word behavior. Differences
in disfluency distribution, word choice, syntax, and discourse style can
affect language modeling. And, of course, as these components are not
independent of one another, all affect overall recognizer performance.
In this chapter, we discuss both theoretical and corpus-based descrip-
tions of non-native speech.
9.2.1 Theoretical Models
It is a common assumption that pronunciation of an L2, or target language,
is directly related to the speakers L1, and that the path to native-sounding
speech is a straight trajectory through phonetic and phonological space.
Research in second language acquisition (SLA), however, has shown
that the reality is far more complex. Learners of a language are on a jour-
ney to proficiency, affected by developments in motor control, perception
of the L2, familiarity with languages outside the L1-L2 pair in question,
stress level, and a myriad of other influences that make the speech of
any one speaker at any one moment a dynamic idiolect. The broadly-
shared perception of accent—that is, the ability of native speakers across
demographic lines to identify a non-native accent as French or Spanish
or English—is difficult to reconcile with quantitative evidence from SLA
research showing that speakers of a particular L1 differ from one another
in more dimensions than they share when speaking an L2.
The theoretical model of SLAprobably closest to the common lay view
of foreign accent is known as Contrastive Analysis (CA) (James, 1980). CA
theory claims that “speakers tend to hear another language and attempt to
produce utterances in it in terms of the structure of their own language, thus
accounting for their ‘accent’ in L2” (Ferguson, 1989). While CA is intu-
itively very attractive, it has not been an effective method for predicting all
mistakes language learners make. Non-native speakers may approximate
with a sound from outside the phonetic inventory of either L1 or L2 (see,
for example, Brière [1996]). Other attempts to diagnose, explain, and pre-
dict pronunciation errors in non-native speech include Error Analysis and
Transfer Analysis.
Contrastive Analysis Standard realization of L2 is compared with stan-
dard realization of L1 to predict deviations.
Error Analysis Standard realization of L2 is compared with an interme-
diate language (IL)—which represents the speakers model of L2 at
any given time—to predict deviations.
Transfer Analysis Avariant of CAthat only claims to predict those devi-
ations that have their root in influences from L1, recognizing that
other deviations will occur.
Production of a foreign language, of course, is about much more than
phonetic similarity between L1 and L2. As speakers gain experience in a
language, certain pronunciations become fossilized while others continue
to change. Learners also change in their approach to the pragmatics of
speaking the language. Tarone et al. (1983) describe strategies that language
learners use to overcome difficulties in four major areas: phonological,

Get Multilingual Speech Processing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.