Chapter 7. Blogs, RSS, Wikipedia, and Natural Language Processing
This chapter focuses on natural language processing (NLP), the field of study that deals with natural language. Before digging into the details of NLP, we'll analyze some options to download textual data from the Web.
In this chapter, we will discuss the following topics:
- How to interact with the WordPress.com and Blogger APIs
- Web feed formats (RSS and Atom) and their use
- How to store data from blogs in the JSON format
- How to interact with the Wikipedia API to search the information about entities
- The core notions of NLP, particularly with regard to text preprocessing
- How to process textual data to identify entities mentioned in the text
Blogs and NLP
Blogs (short for weblogs) are nowadays ...