Chapter 14
Robust Language Identification with RapidMiner: A Text Mining Use Case
Matko Bošnjak
University of Porto, Porto, Portugal; Rudjer Boskovic Institute, Zagreb, Croatia
Eduarda Mendes Rodrigues
University of Porto, Porto, Portugal
Luis Sarmento
Sapo.pt - Portugal Telecom, Lisbon, Portugal
Acronyms
API -Application Programming Interface
ETL -Extract, Transform and Load
HTTP -HyperText Transfer Protocol
k-NN -k Nearest Neighbours
NLP -Natural Language Processing
SVM -Support Vector Machines
TF-IDF -Term Frequency -Inverse Document Frequency
UTF-8 -Unicode Transformation Format – 8-bit
XML -eXtensible Markup Language
14.1 Introduction
Language identification, the process of determining the language of machine-readable text, is an important ...
Get RapidMiner now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.