O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

System performance

In terms of performance, our results greatly vary depending on our hardware, and the number of CPUs and memory that we give to our VM. In a real deployment, we get horizontal scalability allowing us to crawl as fast as our servers allow.

The theoretical maximum that one could get with the given settings is 3 servers ∙ 4 processes/server ∙ 16 requests in parallel ∙ 4 pages/second (as defined by the page download latencies) = 768 pages/second.

In practice, using a Macbook Pro with 4 GB of RAM and 8 cores allocated to a VirtualBox VM, I got 50000 URLs in 2:40, which means about 315 pages/second. On an Amazon EC2 m4.large instance with 2 vCPUs and 8 GB RAM, it took 6:12 giving 134 pages/second due to limited CPU capacity. On an Amazon ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required