May 2017
Beginner to intermediate
220 pages
5h 2m
English
In this chapter, we learned that caching downloaded web pages will save time and minimize bandwidth when recrawling a website. However, caching pages takes up disk space, some of which can be alleviated through compression. Additionally, building on top of an existing storage system, such as Redis, can be useful to avoid speed, memory, and filesystem limitations.
In the next chapter, we will add further functionalities to our crawler so we can download web pages concurrently and crawl the web even faster.
Read now
Unlock full access