O'Reilly logo

Python Data Analysis by Ivan Idris

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Analyzing Textual Data and Social Media

In the previous chapters, we focused on the analysis of structured data, mostly in tabular format. In reality, plain text is the most predominant form of data available today. Text analysis applies analysis of word frequency distributions, pattern recognition, tagging, link and association analysis, sentiment analysis, and visualization. We will analyze text with the Python Natural Language Toolkit (NLTK) library. NLTK comes with a collection of sample texts called corpora. A small example of network analysis will also be covered. The following topics will be discussed in this chapter:

  • Installing NLTK
  • Filtering out stopwords, names, and numbers
  • The bag-of-words model
  • Analyzing word frequencies
  • Naive ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required