Chapter 17. Scraping Remotely

In the last chapter, you looked at running web scrapers across multiple threads and processes, where communication between them was somewhat limited or had to be carefully planned. This chapter brings this concept to its logical conclusion—running crawlers not just in separate processes, but on entirely separate machines.

That this is the last technical chapter is the book is somewhat appropriate. Until now you have been running all the Python applications from the command line, within the confines of your home computer. Sure, you might have installed MySQL in an attempt to replicate the environment of a real-life server. But it’s just not the same. As the saying goes: “If you love something, set it free.”

This chapter covers several methods for running scripts from different machines, or even just different IP addresses on your own machine. Although you might be tempted to put this step off as something you don’t need right now, you might be surprised at how easy it is to get started with the tools you already have (such as a personal website on a paid hosting account), and how much easier your life becomes when you stop trying to run Python scrapers from your laptop.

Why Use Remote Servers?

Although using a remote server might seem like an obvious step when launching a web app intended for use by a wide audience, more often than not the tools we build for our own purposes are left running locally. People who decide to push onto a remote platform ...

Get Web Scraping with Python, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.