O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Troubleshooting flow

To summarize, Scrapy is designed to have the downloader as a bottleneck. Start with a low value of CONCURRENT_REQUESTS and increase until just before you hit one of the following limits:

  • CPU usage > 80-90%
  • Source website latency increasing excessively
  • Memory limit of 5 Mb of Responses in your scraper

At the same time also perform the following:

  • Keep at least a few Requests at all times in the scheduler's queues (mqs/dqs) to prevent the downloader's URL starvation
  • Never use any blocking code or CPU-intensive code
Troubleshooting flow

Figure 13. Troubleshooting Scrapy's performance problems

Figure 13 summarizes the procedure of diagnosing and repairing Scrapy's ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required