Chapter 11. Analyzing Unstructured Data with Text Mining

There is a lot of unstructured data out there, such as news articles, customer feedbacks, Twitter tweets and so on, that contains information and needs to be analyzed. Text mining is a data mining technique that helps us to perform an analysis of this unstructured data.

In this chapter, we'll learn the following:

  • Preprocessing data
  • Plotting a wordcloud from data
  • Word and sentence tokenization
  • Tagging parts of speech
  • Stemming and lemmatization
  • Applying Stanford Named Entity Recognizer

Preprocessing data

We'll use the reviews of Mad Max: Fury Road from the online portals of BBC, Forbes, Guardian, and Movie Pilot.

We'll extensively use the Natural Language Toolkit (NLTK) package of Python in this chapter ...

Get Mastering Python for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.