Chapter 9

Text Mining

Abstract

This chapter provides a detailed look into the emerging area of text mining and text analytics. It starts with a background of the origins of text mining and provides the motivation for this fascinating topic using the example of IBM's Watson, the Jeopardy—the winning computer program that was built almost entirely using concepts from text and data science. This chapter introduces some key concepts important in the area of text analytics such as term frequency–inverse document frequency scores. Finally, it describes two hands-on case studies in which it is shown how to use RapidMiner to address problems like document clustering and automatic gender classification based on text content.

Keywords

Inverse document frequency; ...

Get Data Science, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.