book

Automatic Speech Recognition and Translation for Low Resource Languages

Name: Automatic Speech Recognition and Translation for Low Resource Languages
ISBN: 9781394213580

by L. Ashok Kumar, D. Karthika Renuka, Bharathi Raja Chakravarthi, Thomas Mandl

April 2024

Intermediate to advanced

496 pages

13h 10m

English

Wiley-Scrivener

Read now

Unlock full access

Cover
Table of Contents
Series Page
Title Page
Copyright Page
Dedication Page
Foreword
Preface
Acknowledgement
1 A Hybrid Deep Learning Model for Emotion Conversion in Tamil Language
1.1 Introduction1.2 Dataset Collection and Database Preparation1.3 Pre-Trained CNN Architectural Models1.4 Proposed Method for Emotion Transformation1.5 Synthesized Speech Evaluation1.6 ConclusionReferences

2 Attention-Based End-to-End Automatic Speech Recognition System for Vulnerable Individuals in Tamil
2.1 Introduction2.2 Related Work2.3 Dataset Description2.4 Implementation2.5 Results and Discussion2.6 ConclusionReferences
3 Speech-Based Dialect Identification for Tamil
3.1 Introduction3.2 Literature Survey3.3 Proposed Methodology3.4 Experimental Setup and Results3.5 ConclusionReferences
4 Language Identification Using Speech Denoising Techniques: A Review
4.1 Introduction4.2 Speech Denoising and Language Identification4.3 The Noisy Speech Signal is Denoised Using Temporal and Spectral Processing4.4 The Denoised Signal is Classified to Identify the Language Spoken Using Recent Machine Learning Algorithm4.5 ConclusionReferences
5 Domain Adaptation-Based Self-Supervised ASR Models for Low-Resource Target Domain
5.1 Introduction5.2 Literature Survey5.3 Dataset Description5.4 Self-Supervised ASR Model5.5 Domain Adaptation for Low-Resource Target Domain5.6 Implementation of Domain Adaptation on wav2vec2 Model for Low-Resource Target Domain5.7 Results Analysis5.8 ConclusionAcknowledgementsReferences
6 ASR Models from Conventional Statistical Models to Transformers and Transfer Learning
6.1 Introduction6.2 Preprocessing6.3 Feature Extraction6.4 Generative Models for ASR6.5 Discriminative Models for ASR6.6 Deep Architectures for Low-Resource Languages6.7 The DNN-HMM Hybrid System6.8 SummaryReferences
7 Syllable-Level Morphological Segmentation of Kannada and Tulu Words
7.1 Introduction7.2 Related Work7.3 Corpus Construction and Annotation7.4 Methodology7.5 Experiments and Results7.6 Conclusion and Future WorkReferences
8 A New Robust Deep Learning-Based Automatic Speech Recognition and Machine Transition Model for Tamil and Gujarati
8.1 Introduction8.2 Literature Survey8.3 Proposed Architecture8.4 Experimental Setup8.5 Results8.6 ConclusionReferences
9 Forensic Voice Comparison Approaches for Low-Resource Languages
9.1 Introduction9.2 Challenges of Forensic Voice Comparison9.3 Motivation9.4 Review on Forensic Voice Comparison Approaches9.5 Low-Resource Language Datasets9.6 Applications of Forensic Voice Comparison9.7 Future Research Scope9.8 ConclusionReferences
10 CoRePooL—Corpus for Resource-Poor Languages: Badaga Speech Corpus
10.1 Introduction10.2 CoRePooL10.3 Benchmarking10.4 ConclusionAcknowledgementReferences
11 Bridging the Linguistic Gap: A Deep Learning-Based Image-to-Text Converter for Ancient Tamil with Web Interface
11.1 Introduction11.2 The Historical Significance of Ancient Tamil Scripts11.3 Realization Process11.4 Dataset Preparation11.5 Convolution Neural Network11.6 Webpage with Multilingual Translator11.7 Results and Discussions11.8 Conclusion and Future WorkReferences
12 Voice Cloning for Low-Resource Languages: Investigating the Prospects for Tamil
12.1 Introduction12.2 Literature Review12.3 Dataset12.4 Methodology12.5 Results and Discussion12.6 ConclusionReferences
13 Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages
13.1 Introduction13.2 Literature Review13.3 Dataset Description13.4 Methodology13.5 Experimentation Results and Analysis13.6 ConclusionReferences
14 Language Detection Based on Audio for Indian Languages
14.1 Introduction14.2 Literature Review14.3 Language Detector System14.4 Experiments and Outcomes14.5 ConclusionReferences
15 Strategies for Corpus Development for Low-Resource Languages: Insights from Nepal
15.1 Low-Resource Languages and the Constraints15.2 Language Resources Map for the Languages of Nepal15.3 Unicode Inception and Advent in Nepal15.4 Speech and Translation Initiatives15.5 Corpus Development Efforts—Sharing Our Experiences15.6 Constraints to Competitive Language Technology Research for Nepali and Nepal’s Languages15.7 Roadmap for the Future15.8 ConclusionReferences
16 Deep Neural Machine Translation (DNMT): Hybrid Deep Learning Architecture-Based English-to-Indian Language Translation
16.1 Introduction16.2 Literature Survey16.3 Background16.4 Proposed System16.5 Experimental Setup and Results Analysis16.6 Conclusion and Future WorkReferences
17 Multiview Learning-Based Speech Recognition for Low-Resource Languages
17.1 Introduction17.2 Approaches of Information Fusion in ASR17.3 Partition-Based Multiview Learning17.4 Data Augmentation Techniques17.5 ConclusionReferences
18 Automatic Speech Recognition Based on Improved Deep Learning
18.1 Introduction18.2 Literature Review18.3 Proposed Methodology18.4 Results and Discussion18.5 ConclusionReferences
19 Comprehensive Analysis of State-of-the-Art Approaches for Speaker Diarization
19.1 Introduction19.2 Generic Model of Speaker Diarization System19.3 Review of Existing Speaker Diarization Techniques19.4 Challenges19.5 Applications19.6 ConclusionReferences
20 Spoken Language Translation in Low-Resource Language
20.1 Introduction20.2 Related Work20.3 MT Algorithms20.4 Dataset Collection20.5 ConclusionReferences
Index
End User License Agreement

Content preview from Automatic Speech Recognition and Translation for Low Resource Languages

19Comprehensive Analysis of State-of-the-Art Approaches for Speaker Diarization

Trisiladevi C. Nagavi*, Samanvitha S., Shreya Sudhanva, Sukirth Shivakumar and Vibha Hullur

S. J. College of Engineering, JSS Science and Technology University, Mysore Karnataka, India

Abstract

Speaker diarization is the ability to compare, recognize, comprehend, and segregate different sound waves on the basis of the identity of the speaker. As an illustration of this theory, different ways to achieve these objectives are analyzed in this book chapter. Speaker diarization can prove to be crucial in the future with regards to the field of education, healthcare, forensics, smart traffic management, media, etc. There are numerous steps associated in the process of speaker diarization and each step can be accomplished using different models. The steps involved in the speaker diarization include voice activity detection, feature extraction, segmentation, embedding extraction, and clustering. Voice detection can be achieved using Simulink in Matlab, software such as Audacity, Webrtcvad, or other deep learning methods. Further, mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC) are well-known methods available for speech feature extraction. Additionally, segmentation can be achieved using metric-based approaches or by using deep neural networks. There are several frameworks available by Python for the purpose of embedding extraction based on the type of vectors ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision

Publisher Resources

ISBN: 9781394213580Purchase Link

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Automatic Speech Recognition and Translation for Low Resource Languages

by L. Ashok Kumar, D. Karthika Renuka, Bharathi Raja Chakravarthi, Thomas Mandl

19Comprehensive Analysis of State-of-the-Art Approaches for Speaker Diarization

Abstract

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.