11Bridging the Linguistic Gap: A Deep Learning-Based Image-to-Text Converter for Ancient Tamil with Web Interface

S. Umamaheswari1*, G. Gowtham1 and K. Harikumar2

1ECE, Kumaraguru College of Technology, Coimbatore, India

2Engineering and Industry Services, Tata Consultancy Services, Chennai, India

Abstract

Translating inscriptions in low-resource languages, particularly ancient ones engraved in materials like stone or metal, poses significant challenges for researchers in the field of epigraphy, a discipline within archaeology. Factors such as faint engravings and missing pieces often complicate the translation process, making it time-consuming. This chapter proposes a solution by utilizing deep learning algorithms to convert images of Tamil inscriptions into modern Tamil text. The resulting output can be visualized in any language through the Google Translate API. Once the model is trained, there are plans to deploy a user-friendly interface, possibly a webpage or web application, to make it accessible to a wider audience. The writing methodology of Tamil inscriptions is distinct, consisting of 12 vowels and 18 consonants, and is written from left to right. Early Tamil inscriptions, mainly found in the Pandya, Chola, and Chera kingdoms, took the form of poems and documented heroic deeds such as battles and conquests. Ancient Tamil, one of the oldest low-resource languages, is inscribed on temple walls, with “Olaichuvadi” meaning palm leaf. These inscriptions serve as valuable ...

Get Automatic Speech Recognition and Translation for Low Resource Languages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.