February 2018
Beginner to intermediate
364 pages
10h 32m
English
There are number of techniques that can be used to control concurrency levels, and the process can often be quite complicated with controlling multiple requests and threads of execution. We won't discuss here how this is done at the thread level and only mention the construct built into Scrapy.
Scrapy is inherently concurrent in its requests. By default, Scrapy will dispatch at most eight simultaneous requests to any given domain. You can change this using the CONCURRENT_REQUESTS_PER_DOMAIN setting. The following sets the value to 1 concurrent request:
process = CrawlerProcess({ 'CONCURRENT_REQUESTS_PER_DOMAIN': 1})process.crawl(Spider)process.start()
Read now
Unlock full access