Scrapy Performance Tuning
If we check the initial full scrape of the example site and take a look at the start and end times, we can see the scrape took approximately 1,697 seconds. If we calculate how many seconds per page (on average), that is ~6 seconds per page. Knowing we did not use the Scrapy concurrency features and fully aware that we also added a delay of ~5 seconds between requests, this means Scrapy is parsing and extracting data at around 1s per page (Recall from Chapter 2, Scraping the Data, that our fastest scraper using XPath took 1.07s). I gave a talk at PyCon 2014 comparing web scraping library speed, and even then, Scrapy was massively faster than any other scraping frameworks I could find. I was able to write a simple ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access