Chapter 9

Text Mining


This chapter provides a detailed look into the emerging area of text mining and text analytics. It starts with a background of the origins of text mining and provides the motivation for this fascinating topic using the example of IBM's Watson, the Jeopardy—the winning computer program that was built almost entirely using concepts from text and data science. This chapter introduces some key concepts important in the area of text analytics such as term frequency–inverse document frequency scores. Finally, it describes two hands-on case studies in which it is shown how to use RapidMiner to address problems like document clustering and automatic gender classification based on text content.


Inverse document frequency; ...

Get Data Science, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.