9.10. SUMMARY 315
been solved by a combination of technical and sociological means. Two
recent examples are the widespread uptake of cellular telephony, and sev-
eral successful AIDS-awareness campaigns. In both cases, the critical
success factor was the importance of the intervention to the lives of the
users by providing dramatically improved communications in the one case,
and limiting the spread of a debilitating disease in the other. If spoken inter-
faces can be shown to have comparable value in the developing world, one
can be confident that the hurdles to their development will be overcome.
9.10 Summary
In this chapter, we have examined characteristics of and approaches to
automatic recognition of non-native, accented, and dialectal speech. This
topic has received comparably little attention in the past. For the real-world
use of speech-driven interfaces, systems need to be able to handle speech
other than purely native speech; the number of languages for which ASR
systems can be built will always be limited, and non-native words will occur
in many domains. To date, most ASR systems show significant increase in
word error rates if faced with non-native speech.
In order to appropriately model non-native speech, we first have to
determine how non-native speech differs from native speech. We have
tried to give some insight into these differences and their origins. We have
shown that influences on non-native speech are complex, far exceeding the
commonly assumed sole influence of the mother tongue.
A particular challenge in non-native and dialectal speech is that a huge
number of permutations exist in terms of which language was spoken
with the accent of which other language. Detailed investigations are often
constrained by the lack of sufficient amounts of non-native or dialectal
We have also described how standard methods in native speech recog-
nition, such as specialized acoustic modeling, speaker adaptation, and
adaptation of the pronunciation dictionary, apply to the non-native case,
and what the particular challenges are for these methods. We have shown
that these methods can indeed help non-native speech if they are tailored
to this special case.
Apart from the modeling issues described above, we have also investi-
gated how the design of different interfaces needs to be modified in order
to increase usability for non-native speakers, who often have particular
needs and problems when using automated systems. This is of particular
importance for speech-based interfaces in the developing world, where cul-
tural differences need to be taken into account in successfully deploying
speech-based interfaces.
To summarize, we can say that there are certainly some approaches
that are helping increase the performance of speech recognition systems
when faced with non-native and accented speech. On the other hand, many
questions are left unanswered, leaving plenty of room for future research.

Get Multilingual Speech Processing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.