Reading raw text from the Web

Most of the times, the free-form text can be found in text files; in this recipe, we will not be teaching you how to do that as we have already presented many ways of doing so. (Refer to the set of recipes in Chapter 1, Preparing the Data.)

Note

One way of reading a file that we have not explored yet will be discussed in the next recipe.

Many times, however, we need to read data straight from the web: we might want to analyze a blog post, scrape an article, or analyze Facebook or Twitter posts. While Facebook and Twitter offer Application Programming Interfaces (APIs) that normally return answers in XML or JSON formats, processing HTML files is not as straightforward.

In this recipe, you will learn how to access a web ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.