Chapter 3. Applications of Web Scraping

While web scrapers can help almost any business, often the real trick is figuring out how. Like artificial intelligence, or really, programming in general, you can’t just wave a magic wand and expect it to improve your bottom line.

Applying the practice of web scraping to your business takes real strategy and careful planning in order to use it effectively. You need to identify specific problems, figure out what data you need to fix those problems, and then outline the inputs, outputs, and algorithms that will allow your web scrapers to create that data.

Classifying Projects

When planning a web scraping project, you should think about how it fits into one of several categories.

Is your web scraper “broad” or “targeted”? You can write templates to instruct a targeted web scraper but need different techniques for a broad one:

  • Will you be scraping a single website or perhaps even a fixed set of pages within that website? If so, this is an extremely targeted web scraping project. 
  • Do you need to scrape a fixed number of known websites? This is still a fairly targeted scraper, but you may need to write a small amount of custom code for each website and invest a little more time into the architecture of your web scraper. 
  • Are you scraping a large number of unknown websites and discovering new targets dynamically? Will you build a crawler that must automatically detect and make assumptions about the structure of the websites? You may be writing ...

Get Web Scraping with Python, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.