May 2017
Beginner to intermediate
220 pages
5h 2m
English
In the previous chapter, we learned how to scrape data from crawled web pages and save the results to a CSV file. What if we now want to scrape an additional field, such as the flag URL? To scrape additional fields, we would need to download the entire website again. This is not a significant obstacle for our small example website; however, other websites can have millions of web pages, which could take weeks to recrawl. One way scrapers avoid these problems is by caching crawled web pages from the beginning, so they only need to be downloaded once. In this chapter, we will cover a few ways to do this using our web crawler.
In this chapter, we will cover the following topics:
Read now
Unlock full access