6ASR Models from Conventional Statistical Models to Transformers and Transfer Learning

Elizabeth Sherly*, Leena G. Pillai and Kavya Manohar

Digital University of Kerala, Trivandrum, Kerala, India

Abstract

Recently, automatic speech recognition (ASR) systems have made remarkable progress and a great leap in this field, resulting in a number of voice assistants like Alexa, Google Assistant, Siri, and Cortana. A number of computational models benefited from rich resource languages due to the availability of data that have shown greater accuracy in ASR systems. The journey of conventional statistical models to recent state-of-the-art deep learning models can establish an effective strategy for linguistically diverse variant languages too. However, building ASR systems for low-resource languages, that include most Indian languages is challenging. Techniques for less data-intensive languages are daunting and paid less attention to in the research community of ASR. Starting with classical statistical models, like hidden Markov model-Gaussian mixture model (HMM-GMM) as a generative model, then switched to discriminative models like support vector machine (SVM), gracefully welcomed by the ASR community as proven models, but shallow in nature. However, the emergence of deep learning brought considerable reductions in word error rate, now becoming prominent in ASR models. This chapter highlights how deep learning models can be used in low-resource languages for a better performance and ...

Get Automatic Speech Recognition and Translation for Low Resource Languages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.