3

Building Your First Web Scraping Application

The internet, and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence the name HyperText Transfer Protocol), which started the WWW.

This process happens each time that we request a web page, so it should be familiar to almost everyone. But we can also perform these operations programmatically to retrieve and process information automatically. Python has in its standard library an HTTP client, but the fantastic requests module makes obtaining web pages very easy. In this chapter, we will see how.

In this chapter, we'll cover the following recipes: ...

Get Python Automation Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.