Chapter 9. Analyzing Textual Data and Social Media

In the previous chapters, we focused on the analysis of structured data, mostly in tabular format. In reality, plain text is the most predominant form of data available today. Text analysis applies analysis of word frequency distributions, pattern recognition, tagging, link and association analysis, sentiment analysis, and visualization. We will analyze text with the Python Natural Language Toolkit (NLTK) library. NLTK comes with a collection of sample texts called corpora. A small example of network analysis will also be covered. The following topics will be discussed in this chapter:

Installing NLTK
Filtering out stopwords, names, and numbers
The bag-of-words model
Analyzing word frequencies
Naive ...

Get Python Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python Data Analysis by Ivan Idris

Chapter 9. Analyzing Textual Data and Social Media

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly