Skip to Content
System Design on AWS
book

System Design on AWS

by Jayanth Kumar, Mandeep Singh
February 2025
Intermediate to advanced content levelIntermediate to advanced
612 pages
19h 18m
English
O'Reilly Media, Inc.
Content preview from System Design on AWS

Chapter 15. Designing a Web Crawler and Search Engine

You have planned a get-together with your loved ones during the holiday season. You love cooking and have decided to cook all the food by yourself, but you don’t have the recipes for the dishes you wish to prepare. What is the best possible resolution here? You could ask your friends if they have the recipes or go looking through cookbooks, but a simple yet effective solution is using Google search. Google looks across the internet and finds the best results for how to prepare a specific dish. How does Google go through such a vast sea of information and find the perfect answer? In this chapter, we’ll try to figure this out by digging into the architecture of such search systems.

At a high level, the entire system consists of two subsystems: a web crawler and a search engine, as shown in Figure 15-1. A web crawler is essentially software responsible for crawling through web content. Content on the internet is growing exponentially, and web crawlers need to regularly crawl the content to maintain the most updated results. The search engine sits on top of content accumulated by web crawlers and stores it in such a way that it can look for user-searched keywords in the content and present the most useful results.

With this basic understanding, let’s start by gathering the functional and nonfunctional requirements of the proposed system.

Figure 15-1. Ten-thousand-foot view of the web crawler and search engine architecture
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Terraform: Up and Running, 3rd Edition

Terraform: Up and Running, 3rd Edition

Yevgeniy Brikman
Kubernetes: Up and Running, 3rd Edition

Kubernetes: Up and Running, 3rd Edition

Brendan Burns, Joe Beda, Kelsey Hightower, Lachlan Evenson
AWS for Solutions Architects - Third Edition

AWS for Solutions Architects - Third Edition

Saurabh Shrivastava, Neelanjali Srivastav, Dhiraj Thakur

Publisher Resources

ISBN: 9781098146887Errata Page