17Multiview Learning-Based Speech Recognition for Low-Resource Languages

Aditya Kumar* and Jainath Yadav

Department of Computer Science Central University of South Bihar, Gaya, Bihar, India

Abstract

Automatic speech recognition (ASR) requires substantial amounts of processed and unprocessed text, as well as resources for lexicon, syntax, and semantics. Low-resource languages (LRLs) typically lack these resources, which makes ASR for these languages challenging. Multiview learning is a cutting-edge technique that has gained popularity in the domain of machine learning in recent years and it is used to handle multi-modal or heterogeneous data where different views or representations of the same data are available. In the case of speech recognition, the utterance can be partitioned in multiple ways to obtain multiple views, such as time-domain partitioning (TDP), frequency-domain partitioning (FDP), spectrogram partitioning (SP), modality-specific partitioning (MSP), and multi-channel partitioning (MCP). Multiview learning seeks to enhance the performance of a machine learning system by utilizing complementary information from many views. This chapter focuses on the challenges and limitations of multiview learning in ASR and future directions to deal with issues in LRLs. State-of-the-art approaches that handle the issues in LRL have been included in the chapter, which provides a comprehensive summary of information fusion approaches to handle LRL issues.

Keywords: Speech recognition, ...

Get Automatic Speech Recognition and Translation for Low Resource Languages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.