Chapter 14

Robust Language Identification with RapidMiner: A Text Mining Use Case

Matko Bošnjak

University of Porto, Porto, Portugal; Rudjer Boskovic Institute, Zagreb, Croatia

Eduarda Mendes Rodrigues

University of Porto, Porto, Portugal

Luis Sarmento

Sapo.pt - Portugal Telecom, Lisbon, Portugal

Acronyms

API -Application Programming Interface

ETL -Extract, Transform and Load

HTTP -HyperText Transfer Protocol

k-NN -k Nearest Neighbours

NLP -Natural Language Processing

SVM -Support Vector Machines

TF-IDF -Term Frequency -Inverse Document Frequency

UTF-8 -Unicode Transformation Format – 8-bit

XML -eXtensible Markup Language

14.1 Introduction

Language identification, the process of determining the language of machine-readable text, is an important ...

Get RapidMiner now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.