O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Being a good citizen in a world full of spiders

There are a few things one needs to be aware of while developing scrapers. Irresponsible web scraping can be annoying and even illegal in some cases. The two most important things to avoid are denial-of-service (DoS) attack like behavior and violating copyrights.

In the first one, a typical visitor might be visiting a new page every few seconds. A typical web crawler might be downloading tens of pages per second. That is more than ten times the traffic that a typical user generates. This might reasonably make the website owners upset. Use throttling to reduce the traffic you generate to an acceptable user-like level. Monitor the response times, and if you see them increasing, reduce the intensity ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required