13Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages

Divi Eswar Chowdary, Rahul Ganesan, Harsha Dabbara, G. Jyothish Lal* and Premjith B.

Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India

Abstract

India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (HMM) have been applied to low-resource languages for automatic speech recognition (ASR), but their performance is limited by the availability of quality datasets. Moreover, scarcity of high-quality data is a huge detriment for Indian languages. Transformers, on the other hand, have emerged as a popular and effective deep learning model for ASR due to their pre-trained parameters and fine-tuning capabilities. OpenAI’s Whisper model is an ASR system trained on a vast amount of multilingual and multitask data collected from the web. Due to its capabilities and functionalities, it is considered the new benchmark for ASR. While the Whisper model does recognize some Indian languages, there is no specific training for Dravidian languages. However, these languages are of particular interest due to their common roots with other Indian languages and their unique challenges in being spoken natively ...

Get Automatic Speech Recognition and Translation for Low Resource Languages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.