Part II. Advanced Scraping

You’ve laid some web-scraping groundwork; now comes the fun part. Up until this point our web scrapers have been relatively dumb. They’re unable to retrieve information unless it’s immediately presented to them in a nice format by the server. They take all information at face value and simply store it without any analysis. They get tripped up by forms, website interaction, and even JavaScript. In short, they’re no good for retrieving information unless that information really wants to be retrieved.

This part of the book will help you analyze raw data to get the story beneath the data—the story that websites often hide beneath layers of JavaScript, login forms, and antiscraping measures.

You’ll learn how to use web scrapers to test your sites, automate processes, and access the Internet on a large scale. By the end of this section, you should have the tools to gather and manipulate nearly any type of data, in any form, across any part of the Internet.

Get Web Scraping with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.