Chapter 9. Working with HTML: Web Scraping

Image

In a perfect world, it would be easy to get your hands on all the data you need.

Alas, this is rarely true. Case in point: data is published on the web. Data embedded in HTML is designed to be rendered by web browsers and read by humans. But what if you need to process HTML-embedded data with code? Are you out of luck? Well, as luck would have it, Python is somewhat of a star when it comes to scraping data from web pages, and in this chapter you’ll learn how to do just that. You’ll also learn how to parse those scraped HTML pages to extract usable data. Along the way, you’ll meet slices and soup. But, don’t worry, this is still Head First Python, not Head First Cooking

The Coach needs more data

Image

There’s no harm in asking.

You’ve sat down with the Coach over coffee and he’s explained what he wants. In addition to the current bar chart, the Coach wants to see the current world records for both men and women, for both course lengths, for any selected distance and stroke. The Coach is convinced that sharing the world record times with his swimmers gives them “something to aim for.”

The Coach even sketched out his idea on the back of a paper napkin.

Cubicle Conversation

Alex: Can’t we just show one number at the bottom instead ...

Get Head First Python, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.