O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The key take-away

The most important takeaway of this chapter is that if you are about to perform distributed crawling, always use suitably sized batches.

Depending on how fast your source websites respond, you may have hundreds, thousands, or tens of thousands of URLs. You would like them to be large enough—in the few-minutes level—so that any startup costs are amortized sufficiently. On the other hand, you wouldn't like them to be too large as this would turn a machine failure to a major risk. In a fault-tolerant distributed system, you would retry failed batches; and you wouldn't want this to be hours worth of work.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required