Multilingual Speech Processing

276 CHAPTER 9. OTHER CHALLENGES: NON-NATIVE SPEECH

may need to be explicitly represented in the dictionary for improved

recognition rates. This is true for both native and non-native variation.

Since the adaptation of the pronunciation dictionary, henceforth called

pronunciation adaptation, plays an important role in the case of non-

native speakers, we will discuss this problem in more detail in Section 9.5.2.

Speaker adaptation as well as pronunciation adaptation are likely to

improve recognition rates. Furthermore, it can be expected that these

improvements are additive if both methods are combined; the performance

of this combination is reviewed in Section 9.6. In Section 9.7 we summa-

rize some results on the performance of these techniques for cross-dialect

recognition for native speech.

Finally, Section 9.9 discusses other factors relevant to the development

of multilingual speech-recognition systems. We show that cultural factors

can have a signiﬁcant impact on user interface design, and then discuss the

speciﬁc challenges of the developing world.

9.2 Characteristics of Non-native Speech

The differences between native and non-native speech can be quantiﬁed

in a variety of ways, all relevant to the problem of improving recognition

for non-native speakers. Differences in articulation, speaking rate, and

pause distribution can affect acoustic modeling, which looks for patterns

in phone pronunciation and duration and cross-word behavior. Differences

in disﬂuency distribution, word choice, syntax, and discourse style can

affect language modeling. And, of course, as these components are not

independent of one another, all affect overall recognizer performance.

In this chapter, we discuss both theoretical and corpus-based descrip-

tions of non-native speech.

9.2.1 Theoretical Models

It is a common assumption that pronunciation of an L2, or target language,

is directly related to the speaker’s L1, and that the path to native-sounding

speech is a straight trajectory through phonetic and phonological space.

9.2. CHARACTERISTICS OF NON-NATIVE SPEECH 277

Research in second language acquisition (SLA), however, has shown

that the reality is far more complex. Learners of a language are on a jour-

ney to proﬁciency, affected by developments in motor control, perception

of the L2, familiarity with languages outside the L1-L2 pair in question,

stress level, and a myriad of other inﬂuences that make the speech of

any one speaker at any one moment a dynamic idiolect. The broadly-

shared perception of accent—that is, the ability of native speakers across

demographic lines to identify a non-native accent as French or Spanish

or English—is difﬁcult to reconcile with quantitative evidence from SLA

research showing that speakers of a particular L1 differ from one another

in more dimensions than they share when speaking an L2.

The theoretical model of SLAprobably closest to the common lay view

of foreign accent is known as Contrastive Analysis (CA) (James, 1980). CA

theory claims that “speakers tend to hear another language and attempt to

produce utterances in it in terms of the structure of their own language, thus

accounting for their ‘accent’ in L2” (Ferguson, 1989). While CA is intu-

itively very attractive, it has not been an effective method for predicting all

mistakes language learners make. Non-native speakers may approximate

with a sound from outside the phonetic inventory of either L1 or L2 (see,

for example, Brière [1996]). Other attempts to diagnose, explain, and pre-

dict pronunciation errors in non-native speech include Error Analysis and

Transfer Analysis.

Contrastive Analysis Standard realization of L2 is compared with stan-

dard realization of L1 to predict deviations.

Error Analysis Standard realization of L2 is compared with an interme-

diate language (IL)—which represents the speaker’s model of L2 at

any given time—to predict deviations.

Transfer Analysis Avariant of CAthat only claims to predict those devi-

ations that have their root in inﬂuences from L1, recognizing that

other deviations will occur.

Production of a foreign language, of course, is about much more than

phonetic similarity between L1 and L2. As speakers gain experience in a

language, certain pronunciations become fossilized while others continue

to change. Learners also change in their approach to the pragmatics of

speaking the language. Tarone et al. (1983) describe strategies that language

learners use to overcome difﬁculties in four major areas: phonological,

Get Multilingual Speech Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Multilingual Speech Processing by Tanja Schultz, Katrin Kirchhoff

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly