Chapter 9. Analyzing Textual Data and Social Media

In the previous chapters, we focused on the analysis of structured data, mostly in tabular format. In reality, plain text is the most predominant form of data available today. Text analysis applies analysis of word frequency distributions, pattern recognition, tagging, link and association analysis, sentiment analysis, and visualization. We will analyze text with the Python Natural Language Toolkit (NLTK) library. NLTK comes with a collection of sample texts called corpora. A small example of network analysis will also be covered. The following topics will be discussed in this chapter:

  • Installing NLTK
  • Filtering out stopwords, names, and numbers
  • The bag-of-words model
  • Analyzing word frequencies
  • Naive ...

Get Python Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.