O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Running a distributed crawl

I, typically, use four terminals to have a complete view of the progress of our crawl. I want to make this section self-contained, so I also provide the vagrant ssh commands that you need to open terminals to the relevant servers.

Running a distributed crawl

Using four terminals to oversee a crawl

With one terminal, 1, I like to monitor the CPU and memory usage across the servers. This helps with identifying and repairing potential problems. To set it up, I run the following:

$ alias provider_id="vagrant global-status --prune | grep 'docker-provider' | awk '{print \$1}'"
$ vagrant ssh $(provider_id)
$ docker ps --format "{{.Names}}" | xargs docker ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required