INDEX 501
insights and open problems, 118–122
introduction to, 71–79
compact and maintainable systems, 78
context dependent acoustic modeling,
74–77
Hidden Markov Models (HMMs), 72–74
multiple language systems versus
multilingual systems, 77–78
overview, 71–72
rapid language deployment, 78–79
language independent sound inventories and
representations, 91–102
alternative representations, 93–96
overview, 91–92
phonetic coverage across languages,
99–102
phonetic sound inventories, 92–93
sound unit sharing across languages,
96–99
problems and challenges of, 79–91
gap between technology and language
expertise, 91
lack of resources: uniform data in multiple
languages, 80–83
language peculiarities, 83–90
multilingual corpora, 370–376
Basic Travel Expression Corpus (BTEC), 370
MT-Assisted Dialogs (MAD), 374–376
overview, 370
Spoken Language Database (SLDB), 372–374
multilingual data-collection efforts, 33–40
cost and dissemination, 39–40
issues for new languages, 35–37
transcription approaches and problems, 37–39
quick and careful transcription
specifications, 37–38
transcription of unwritten varieties: case of
Arabic Colloquials, 38–39
use of multilingual speech data, 34–35
multilingual dictionaries, 123–168
discussion, 166–168
generating pronunciations, 149–166
canonical pronunciations, 153–157
corpus–based validation, 164–166
overview, 149–153
phone sets and acoustic modeling, 162–164
pronunciation variants, 158–162
multilingual dictionaries, 125–129
overview, 123–125
vocabulary selection, 141–149
multilingual considerations, 146–149
overview, 141–142
spoken language-specific vocabulary
items, 145–146
training data selection, 142–145
vocabulary changes over time, 142
word, definition of, 129–141
overview, 129–130
text normalization, 131–141
multilingual language modeling, 169–205
crosslingual comparisons: language modeling
perspective, 177–193
languages lacking explicit word
segmentation, 188–190
languages with very divergent written and
spoken forms, 190–193
morphologically rich languages modeling,
180–188
overview, 177–180
crosslinguistic bootstrapping for language
modeling, 193–199
adaptation of language models using
crosslingual side-information,
195–199
overview, 193–194
discussion and concluding remarks,
203–205
model estimation for new domains and
speaking styles, 174–177
language model adaptation, 176–177
overview, 174–176
overview, 169
statistical language modeling, 169–174
language models for new languages and
multilingual applications,
173–174
overview, 169–173
truly multilingual speech recognition
language models, 200–203
overview, 200
utterance level multilinguality,
200–201
word level multilinguality and
code-switching, 201–203
multilingual machine translation, 350–351
multilingual phone repertory, 260
multilingual recognizer, 77

Get Multilingual Speech Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.