Chapter 11. Analyzing Unstructured Data with Text Mining

There is a lot of unstructured data out there, such as news articles, customer feedbacks, Twitter tweets and so on, that contains information and needs to be analyzed. Text mining is a data mining technique that helps us to perform an analysis of this unstructured data.

In this chapter, we'll learn the following:

  • Preprocessing data
  • Plotting a wordcloud from data
  • Word and sentence tokenization
  • Tagging parts of speech
  • Stemming and lemmatization
  • Applying Stanford Named Entity Recognizer

Preprocessing data

We'll use the reviews of Mad Max: Fury Road from the online portals of BBC, Forbes, Guardian, and Movie Pilot.

We'll extensively use the Natural Language Toolkit (NLTK) package of Python in this chapter ...

