Exercise 52

Scraping Data from the Web

The two most popular uses of Python are data science and web scraping because web scraping typically feeds data to your data science pipeline. If you have an application that needs beer sales, then scraping it off the ATF TTB website is probably your only solution. If you need to train a GPT model on text, then scraping it off various forum websites is a good option. The web has so much data available; it’s just in unfriendly visual formats.

Web scraping is also a great beginner topic for many of the same reasons as data munging:

  1. It’s something everyone understands because they use browsers all day long. Most people have some concept of what a web page is.

  2. Web scraping doesn’t require a ton of theory or ...

Get Learn Python the Hard Way: A Deceptively Simple Introduction to the Terrifyingly Beautiful World of Computers and Data Science, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.