Automatic Speech Recognition and Translation for Low Resource Languages
by L. Ashok Kumar, D. Karthika Renuka, Bharathi Raja Chakravarthi, Thomas Mandl
17Multiview Learning-Based Speech Recognition for Low-Resource Languages
Aditya Kumar* and Jainath Yadav
Department of Computer Science Central University of South Bihar, Gaya, Bihar, India
Abstract
Automatic speech recognition (ASR) requires substantial amounts of processed and unprocessed text, as well as resources for lexicon, syntax, and semantics. Low-resource languages (LRLs) typically lack these resources, which makes ASR for these languages challenging. Multiview learning is a cutting-edge technique that has gained popularity in the domain of machine learning in recent years and it is used to handle multi-modal or heterogeneous data where different views or representations of the same data are available. In the case of speech recognition, the utterance can be partitioned in multiple ways to obtain multiple views, such as time-domain partitioning (TDP), frequency-domain partitioning (FDP), spectrogram partitioning (SP), modality-specific partitioning (MSP), and multi-channel partitioning (MCP). Multiview learning seeks to enhance the performance of a machine learning system by utilizing complementary information from many views. This chapter focuses on the challenges and limitations of multiview learning in ASR and future directions to deal with issues in LRLs. State-of-the-art approaches that handle the issues in LRL have been included in the chapter, which provides a comprehensive summary of information fusion approaches to handle LRL issues.
Keywords: Speech recognition, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access