November 2018
Intermediate to advanced
300 pages
7h 42m
English
In this chapter, we'll first discuss what text mining is, what kind of analysis it is able to offer, and why you might want to use it in your application. We'll then discuss how to work with Mallet, a Java library for natural-language processing, covering data import and text pre-processing. Afterward, we will look into two text-mining applications: topic modeling, where we will discuss how text mining can be used to identify topics found in text documents without reading them individually, and spam detection, where we will discuss how to automatically classify text documents into categories.
This chapter will cover the following topics: