Chapter 13. Designing a Web Crawler and Search Engine
You planned a get together with your loved ones in the holiday season. You love cooking and decided to cook all the food by yourself, but you don’t have the recipes for the dishes you wish to prepare. What is the best possible resolution here? You could ask your friends if they have the recipes, or go looking through cookbooks, but a very simple yet effective solution is using Google search. Google looks across the internet and finds you the best results on how to prepare a specific meal. How does Google go through such a vast sea of information and find the perfect answer? In this chapter, we’ll try to figure out this answer by digging into the architecture of such search systems.
At a high level, the entire system consists of two subsystems, a web crawler and a search engine. A web crawler is essentially a software responsible for crawling the web. The content on the internet ...
Get System Design on AWS now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.