book

Automatic Speech Recognition and Translation for Low Resource Languages

Name: Automatic Speech Recognition and Translation for Low Resource Languages
ISBN: 9781394213580

by L. Ashok Kumar, D. Karthika Renuka, Bharathi Raja Chakravarthi, Thomas Mandl

April 2024

Intermediate to advanced

496 pages

13h 10m

English

Wiley-Scrivener

Read now

Unlock full access

Cover
Table of Contents
Series Page
Title Page
Copyright Page
Dedication Page
Foreword
Preface
Acknowledgement
1 A Hybrid Deep Learning Model for Emotion Conversion in Tamil Language
1.1 Introduction1.2 Dataset Collection and Database Preparation1.3 Pre-Trained CNN Architectural Models1.4 Proposed Method for Emotion Transformation1.5 Synthesized Speech Evaluation1.6 ConclusionReferences

2 Attention-Based End-to-End Automatic Speech Recognition System for Vulnerable Individuals in Tamil
2.1 Introduction2.2 Related Work2.3 Dataset Description2.4 Implementation2.5 Results and Discussion2.6 ConclusionReferences
3 Speech-Based Dialect Identification for Tamil
3.1 Introduction3.2 Literature Survey3.3 Proposed Methodology3.4 Experimental Setup and Results3.5 ConclusionReferences
4 Language Identification Using Speech Denoising Techniques: A Review
4.1 Introduction4.2 Speech Denoising and Language Identification4.3 The Noisy Speech Signal is Denoised Using Temporal and Spectral Processing4.4 The Denoised Signal is Classified to Identify the Language Spoken Using Recent Machine Learning Algorithm4.5 ConclusionReferences
5 Domain Adaptation-Based Self-Supervised ASR Models for Low-Resource Target Domain
5.1 Introduction5.2 Literature Survey5.3 Dataset Description5.4 Self-Supervised ASR Model5.5 Domain Adaptation for Low-Resource Target Domain5.6 Implementation of Domain Adaptation on wav2vec2 Model for Low-Resource Target Domain5.7 Results Analysis5.8 ConclusionAcknowledgementsReferences
6 ASR Models from Conventional Statistical Models to Transformers and Transfer Learning
6.1 Introduction6.2 Preprocessing6.3 Feature Extraction6.4 Generative Models for ASR6.5 Discriminative Models for ASR6.6 Deep Architectures for Low-Resource Languages6.7 The DNN-HMM Hybrid System6.8 SummaryReferences
7 Syllable-Level Morphological Segmentation of Kannada and Tulu Words
7.1 Introduction7.2 Related Work7.3 Corpus Construction and Annotation7.4 Methodology7.5 Experiments and Results7.6 Conclusion and Future WorkReferences
8 A New Robust Deep Learning-Based Automatic Speech Recognition and Machine Transition Model for Tamil and Gujarati
8.1 Introduction8.2 Literature Survey8.3 Proposed Architecture8.4 Experimental Setup8.5 Results8.6 ConclusionReferences
9 Forensic Voice Comparison Approaches for Low-Resource Languages
9.1 Introduction9.2 Challenges of Forensic Voice Comparison9.3 Motivation9.4 Review on Forensic Voice Comparison Approaches9.5 Low-Resource Language Datasets9.6 Applications of Forensic Voice Comparison9.7 Future Research Scope9.8 ConclusionReferences
10 CoRePooL—Corpus for Resource-Poor Languages: Badaga Speech Corpus
10.1 Introduction10.2 CoRePooL10.3 Benchmarking10.4 ConclusionAcknowledgementReferences
11 Bridging the Linguistic Gap: A Deep Learning-Based Image-to-Text Converter for Ancient Tamil with Web Interface
11.1 Introduction11.2 The Historical Significance of Ancient Tamil Scripts11.3 Realization Process11.4 Dataset Preparation11.5 Convolution Neural Network11.6 Webpage with Multilingual Translator11.7 Results and Discussions11.8 Conclusion and Future WorkReferences
12 Voice Cloning for Low-Resource Languages: Investigating the Prospects for Tamil
12.1 Introduction12.2 Literature Review12.3 Dataset12.4 Methodology12.5 Results and Discussion12.6 ConclusionReferences
13 Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages
13.1 Introduction13.2 Literature Review13.3 Dataset Description13.4 Methodology13.5 Experimentation Results and Analysis13.6 ConclusionReferences
14 Language Detection Based on Audio for Indian Languages
14.1 Introduction14.2 Literature Review14.3 Language Detector System14.4 Experiments and Outcomes14.5 ConclusionReferences
15 Strategies for Corpus Development for Low-Resource Languages: Insights from Nepal
15.1 Low-Resource Languages and the Constraints15.2 Language Resources Map for the Languages of Nepal15.3 Unicode Inception and Advent in Nepal15.4 Speech and Translation Initiatives15.5 Corpus Development Efforts—Sharing Our Experiences15.6 Constraints to Competitive Language Technology Research for Nepali and Nepal’s Languages15.7 Roadmap for the Future15.8 ConclusionReferences
16 Deep Neural Machine Translation (DNMT): Hybrid Deep Learning Architecture-Based English-to-Indian Language Translation
16.1 Introduction16.2 Literature Survey16.3 Background16.4 Proposed System16.5 Experimental Setup and Results Analysis16.6 Conclusion and Future WorkReferences
17 Multiview Learning-Based Speech Recognition for Low-Resource Languages
17.1 Introduction17.2 Approaches of Information Fusion in ASR17.3 Partition-Based Multiview Learning17.4 Data Augmentation Techniques17.5 ConclusionReferences
18 Automatic Speech Recognition Based on Improved Deep Learning
18.1 Introduction18.2 Literature Review18.3 Proposed Methodology18.4 Results and Discussion18.5 ConclusionReferences
19 Comprehensive Analysis of State-of-the-Art Approaches for Speaker Diarization
19.1 Introduction19.2 Generic Model of Speaker Diarization System19.3 Review of Existing Speaker Diarization Techniques19.4 Challenges19.5 Applications19.6 ConclusionReferences
20 Spoken Language Translation in Low-Resource Language
20.1 Introduction20.2 Related Work20.3 MT Algorithms20.4 Dataset Collection20.5 ConclusionReferences
Index
End User License Agreement

Content preview from Automatic Speech Recognition and Translation for Low Resource Languages

10CoRePooL—Corpus for Resource-Poor Languages: Badaga Speech Corpus

Barathi Ganesh H.B.^1,2, Jyothish Lal G.¹*, Jairam R.^1,2, Soman K.P.¹, Kamal N.S.² and Sharmila B.³

¹Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India

²RBG.AI, Resilience Business Grids LLP, SREC Incubation Center, Coimbatore, Tamil Nadu, India

³Sri Ramakrishna Engineering College Coimbatore, Tamil Nadu, India

Abstract

This chapter presents a corpus named CoRePooL that stands for Corpus for Resource-Poor Languages. As voice-specific human-machine interaction applications are accelerated by deep learning algorithms, the lack of resources constrains the scalability in applying to resource-poor languages. In CoRePooL version 0.1.0, we released 420 min of monolingual supervised corpus and 968 minutes of multilingual unsupervised corpus for the Badaga language from the Dravidian language family. The annotation of supervised corpus helps in performing speech-to-text, text-to-speech, translation, gender, and speaker identification. The unsupervised corpus would help self-supervised algorithms which compute latent representations. We also provided the baseline for all the tasks by fine-tuning the foundation models on the released corpus. The code, models, and data are made publicly available at https://github.com/rbg-research/CoRePooL.

Keywords: CoRePooL, Badaga language, speech-to-text, text-to-speech, translation, gender identification, speaker identification ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision

Publisher Resources

ISBN: 9781394213580Purchase Link

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Automatic Speech Recognition and Translation for Low Resource Languages

by L. Ashok Kumar, D. Karthika Renuka, Bharathi Raja Chakravarthi, Thomas Mandl

10CoRePooL—Corpus for Resource-Poor Languages: Badaga Speech Corpus

Abstract

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.