Multilingual Speech Processing

Foreword

Speech recognition and speech synthesis technologies have enjoyed a

period of rapid progress in recent years, with an increasing number of

functional systems being developed for a diverse array of applications.

At the same time, this technology is only being developed for fewer than

twenty of the world’s estimated four to eight thousand languages. This dis-

parity suggests that in the near future, demand for speech recognition and

speech synthesis technologies, and the automated dialog, dictation, and

summarization systems that they support, will come to include an increas-

ingly large number of new languages. Due to current globalization trends,

multilingual speech recognition and speech synthesis technologies, which

will enable things like speech-to-speech translation, and the ability to incor-

porate the voice of a speaker from one language into a synthesizer for a

different language (known as polyglot synthesis), will become increasingly

important.

Because current speech recognition and speech synthesis technolo-

gies rely heavily on statistical methods, when faced with the challenge

of developing systems for new languages, it is often necessary to begin by

constructing new speech corpora (databases). This in turn requires many

hours of recording and large amounts of funding, and consequently, one

of the most important problems facing minority language researchers will

be how to signiﬁcantly advance research on languages for which there

are either limited data resources or scant funds. Moreover, because all

languages have unique characteristics, technologies developed based on

models of one language cannot simply be ported “as-is” to other languages;

this process requires substantial modiﬁcations. These are a few of the major

handicaps facing current efforts to develop speech technology and extend

xxvii

xxviii FOREWORD

it into new areas and to new languages. If speech recognition and syn-

thesis systems could be easily and effectively ported between different

languages, a much greater number of people might share the beneﬁts that

this technology has to offer.

This book spans the state-of-the-art technologies of multilingual speech

processing. Speciﬁcally, it focuses on current research efforts in Europe

and America; it describes new speech corpora under development and

discusses multilingual speech recognition techniques from acoustic and

language modeling to multilingual dictionary construction issues. Regard-

ing the issue of language modeling, it even touches on new morphological

and lexical segmentation techniques. On the topic of multilingual text-

to-speech conversion, it discusses possible methods for generating voices

in new languages. Language identiﬁcation techniques and approaches to

recognition problems involving non-native speakers are also discussed. The

last portion of the book is devoted to issues in speech-to-speech translation

and automated multilingual dialog systems.

Presently enrolled at my laboratory here at Tokyo Tech are not only

Japanese students but also students from eleven other countries, includ-

ing the United States, England, France, Spain, Iceland, Finland, Poland,

Switzerland, Thailand, Indonesia, and Brazil. Despite this large number of

representative languages, the ones for which we are easily able to obtain

comparatively large- scale spoken corpora are limited to widely researched

languages like English, Japanese, and French. Developing new speech

recognition and synthesis systems for any other languages is considerably

expensive and time consuming, mainly because researchers must begin

by ﬁrst constructing new speech corpora. In our lab, for the purpose of

developing an automated dialog system for Icelandic–a language spoken

by approximately 300,000 people worldwide–we are currently conducting

research in an effort to automatically translate a series of written English

corpora into Icelandic. This is possible both because the English data is

abundantly available and because the two languages’ similarity in terms of

grammatical structure makes the idea more feasible than with many other

language pairings.

Constructing corpora is not, however, solely a problem for minority

languages. In order to expand the application of current speech recogni-

tion systems, instead of just recognizing carefully read text from written

manuscripts, the ability to recognize spontaneous speech with a high degree

of precision will become essential. Currently available speech recognition

Get Multilingual Speech Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Multilingual Speech Processing by Tanja Schultz, Katrin Kirchhoff

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly