Chapter 20. Web Scraping Proxies

That this is the last chapter in the book is somewhat appropriate. Until now you have been running all the Python applications from the command line, within the confines of your home computer. As the saying goes: “If you love something, set it free.”

Although you might be tempted to put off this step as something you don’t need right now, you might be surprised at how much easier your life becomes when you stop trying to run Python scrapers from your laptop.

What’s more, since the first edition of this book was published in 2015, a whole industry of web scraping proxy companies has emerged and flourished. Paying someone to run a web scraper for you used to be a matter of paying for the cloud server instance and running your scraper on it like you would any other software. Now, you can make an API request to, essentially, say “fetch this website,” and a remote program will take care of the details, handle any security issues, and return the data to you (for a fee, of course!).

In this chapter, we’ll look at some methods that will allow you to route your requests through remote IP addresses, host and run your software elsewhere, and even offload the work to a web scraping proxy entirely.

Why Use Remote Servers?

Although using a remote server might seem like an obvious step when launching a web application intended for use by a wide audience, often the tools programmers build for their own purposes are left running locally. In the absence of a motivation ...

Get Web Scraping with Python, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.