This chapter provides a detailed look into the emerging area of text mining and text analytics. It starts with a background of the origins of text mining and provides the motivation for this fascinating topic using the example of IBM's Watson, the Jeopardy—the winning computer program that was built almost entirely using concepts from text and data science. This chapter introduces some key concepts important in the area of text analytics such as term frequency–inverse document frequency scores. Finally, it describes two hands-on case studies in which it is shown how to use RapidMiner to address problems like document clustering and automatic gender classification based on text content.
Inverse document frequency; ...