Chapter 11. Analyzing Unstructured Data with Text Mining
There is a lot of unstructured data out there, such as news articles, customer feedbacks, Twitter tweets and so on, that contains information and needs to be analyzed. Text mining is a data mining technique that helps us to perform an analysis of this unstructured data.
In this chapter, we'll learn the following:
- Preprocessing data
- Plotting a wordcloud from data
- Word and sentence tokenization
- Tagging parts of speech
- Stemming and lemmatization
- Applying Stanford Named Entity Recognizer
We'll use the reviews of Mad Max: Fury Road from the online portals of BBC, Forbes, Guardian, and Movie Pilot.
We'll extensively use the Natural Language Toolkit (NLTK) package of Python in this chapter ...