May 2017
Beginner to intermediate
220 pages
5h 2m
English
Sometimes when scraping a website, it can be useful to pause the crawl and resume it at a later time without needing to start over from the beginning. For example, you may need to interrupt the crawl to reset your computer after a software update, or perhaps, the website you are crawling is returning errors and you want to continue the crawl later.
Conveniently, Scrapy comes with built-in support to pause and resume crawls without needing to modify our example spider. To enable this feature, we just need to define the JOBDIR setting with a directory where the current state of a crawl can be saved. Note separate directories must be used to save the state of multiple crawls.
Here is an example using this feature ...
Read now
Unlock full access