Speech recognition and speech synthesis technologies have enjoyed a
period of rapid progress in recent years, with an increasing number of
functional systems being developed for a diverse array of applications.
At the same time, this technology is only being developed for fewer than
twenty of the world’s estimated four to eight thousand languages. This dis-
parity suggests that in the near future, demand for speech recognition and
speech synthesis technologies, and the automated dialog, dictation, and
summarization systems that they support, will come to include an increas-
ingly large number of new languages. Due to current globalization trends,
multilingual speech recognition and speech synthesis technologies, which
will enable things like speech-to-speech translation, and the ability to incor-
porate the voice of a speaker from one language into a synthesizer for a
different language (known as polyglot synthesis), will become increasingly
Because current speech recognition and speech synthesis technolo-
gies rely heavily on statistical methods, when faced with the challenge
of developing systems for new languages, it is often necessary to begin by
constructing new speech corpora (databases). This in turn requires many
hours of recording and large amounts of funding, and consequently, one
of the most important problems facing minority language researchers will
be how to significantly advance research on languages for which there
are either limited data resources or scant funds. Moreover, because all
languages have unique characteristics, technologies developed based on
models of one language cannot simply be ported “as-is” to other languages;
this process requires substantial modifications. These are a few of the major
handicaps facing current efforts to develop speech technology and extend
it into new areas and to new languages. If speech recognition and syn-
thesis systems could be easily and effectively ported between different
languages, a much greater number of people might share the benefits that
this technology has to offer.
This book spans the state-of-the-art technologies of multilingual speech
processing. Specifically, it focuses on current research efforts in Europe
and America; it describes new speech corpora under development and
discusses multilingual speech recognition techniques from acoustic and
language modeling to multilingual dictionary construction issues. Regard-
ing the issue of language modeling, it even touches on new morphological
and lexical segmentation techniques. On the topic of multilingual text-
to-speech conversion, it discusses possible methods for generating voices
in new languages. Language identification techniques and approaches to
recognition problems involving non-native speakers are also discussed. The
last portion of the book is devoted to issues in speech-to-speech translation
and automated multilingual dialog systems.
Presently enrolled at my laboratory here at Tokyo Tech are not only
Japanese students but also students from eleven other countries, includ-
ing the United States, England, France, Spain, Iceland, Finland, Poland,
Switzerland, Thailand, Indonesia, and Brazil. Despite this large number of
representative languages, the ones for which we are easily able to obtain
comparatively large- scale spoken corpora are limited to widely researched
languages like English, Japanese, and French. Developing new speech
recognition and synthesis systems for any other languages is considerably
expensive and time consuming, mainly because researchers must begin
by first constructing new speech corpora. In our lab, for the purpose of
developing an automated dialog system for Icelandic–a language spoken
by approximately 300,000 people worldwide–we are currently conducting
research in an effort to automatically translate a series of written English
corpora into Icelandic. This is possible both because the English data is
abundantly available and because the two languages’ similarity in terms of
grammatical structure makes the idea more feasible than with many other
language pairings.
Constructing corpora is not, however, solely a problem for minority
languages. In order to expand the application of current speech recogni-
tion systems, instead of just recognizing carefully read text from written
manuscripts, the ability to recognize spontaneous speech with a high degree
of precision will become essential. Currently available speech recognition

Get Multilingual Speech Processing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.