O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Creating our custom monitoring command

If you want to monitor the progress of your crawl across many scrapyd servers, you have to do it manually. This is a nice opportunity for us to exercise everything we've seen up to now to create a primitive Scrapy command, scrapy monitor, which monitors a set of scrapyd servers. We will name the file: monitor.py, and we add COMMANDS_MODULE = 'properties.monitor' to our settings.py. With a quick look at scrapyd's documentation, the listjobs.json API gives us information on jobs. If we want to find the base URL for a given target, we may correctly guess that it must be somewhere in the code of scrapyd-deploy so that we can find it in a single file. If we take a look at https://github.com/scrapy/scrapyd-client/blob/master/scrapyd-client/scrapyd-deploy ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required