14Language Detection Based on Audio for Indian Languages

Amogh A. M., A. Hari Priya, Thanvitha Sai Kanchumarti, Likhitha Ram Bommilla and Rajeshkannan Regunathan*

School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India

Abstract

The Indian subcontinent has a varied linguistic community, with 22 officially recognized languages including countless dialects. Each language has its own distinct accent and dialect, making it difficult to determine the language spoken in a given nation. As a result, in such instances, the task of the spoken language identification (SLID) is extremely difficult. The main objective of this chapter is to tackle the problem by presenting a deep learning model that can correctly identify different Indian languages while expanding the number of languages that can be identified. This paper suggests a model for recognizing various Indian languages such as Hindi, Kannada, Bengali, Gujarati, Tamil, Telugu, Marathi, Malayalam, Punjabi, and Urdu. These languages were chosen because they are extensively spoken in India, and a few of them are similar to one another, and the proposed model can predict those similar languages correctly. The audio files are fed into the model in this chapter, which then preprocesses them to produce a spectrogram graph of the speech signals. Spectrogram graphs are important for representing audio signals because they provide information about the signal’s time-varying frequency content. Following ...

Get Automatic Speech Recognition and Translation for Low Resource Languages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.