Automatic Speech Recognition and Translation for Low Resource Languages
by L. Ashok Kumar, D. Karthika Renuka, Bharathi Raja Chakravarthi, Thomas Mandl
15Strategies for Corpus Development for Low-Resource Languages: Insights from Nepal
Bal Krishna Bal1*, Balaram Prasain2, Rupak Raj Ghimire1 and Praveen Acharya3
1Information and Language Processing Research Lab, Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Nepal
2Central Department of Linguistics, Tribhuvan University, Kirtipur, Nepal
3School of Computing, Dublin City University, Dublin, Ireland
Abstract
Datasets or corpora are crucial ingredients for the development of any language technology projects. However, in the majority of situations, these resources appear to be a major issue or bottleneck, especially for low-resource languages. Typically, any low-resource language lacks technological support to encode the script or language computationally. Even for those with such support, the language resources are sparsely developed and lack benchmarking mechanisms, raising the question about the validity of any research and development using those resources. Apparently, it is high time that the low-resource languages develop specific short, medium, and long-term strategies to address these issues so that they could advance research and development of language technologies for their respective languages, at least not falling too much behind, if not at par, with the high-resource languages. This chapter explores the scenario of language computing with a particular focus on the speech and machine translation domains in the context of low-resource ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access