Chapter 7.  Blogs, RSS, Wikipedia, and Natural Language Processing

This chapter focuses on natural language processing (NLP), the field of study that deals with natural language. Before digging into the details of NLP, we'll analyze some options to download textual data from the Web.

In this chapter, we will discuss the following topics:

  • How to interact with the WordPress.com and Blogger APIs
  • Web feed formats (RSS and Atom) and their use
  • How to store data from blogs in the JSON format
  • How to interact with the Wikipedia API to search the information about entities
  • The core notions of NLP, particularly with regard to text preprocessing
  • How to process textual data to identify entities mentioned in the text

Blogs and NLP

Blogs (short for weblogs) are nowadays ...

Get Mastering Social Media Mining with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.