July 2016
Intermediate to advanced
344 pages
10h 11m
English
M. Zampieri*,† ⁎ Saarland University, Saarbrücken, Germany† German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany
Automatic language identification or simply language identification is the task of automatically identifying the language(s) contained in a given document. It is an important part of many text processing pipelines including text mining applications. This chapter provides a concise overview on language identification research from early approaches to state-of-the-art methods.
Keywords
Language identification
Text classification
n-grams
The author would like to thank Binyam Gebrekidan Gebre and Nikola Ljubešić for commenting on a draft ...