3

Building Your First Web Scraping Application

The internet, and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence the name HyperText Transfer Protocol), which started the WWW.

This process happens each time that we request a web page, so it should be familiar to almost everyone. But we can also perform these operations programmatically to retrieve and process information automatically. Python has in its standard library an HTTP client, but the fantastic requests module makes obtaining web pages very easy. In this chapter, we will see how.

In this chapter, we'll cover the following recipes: ...

Get Python Automation Cookbook - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.