So how do you go about being a good scraper? There are several factors to this that we will cover in this chapter:
- You can start with respecting the robots.txt file
- Don't crawl every link you find on a site, just those given in a site map
- Throttle your requests, so as do as Han Solo said to Chewbacca: Fly Casual; or, don't look like you are repeatedly taking content by Crawling Casual
- Identify yourself so that you are known to the site