Algorithm-Based Ranking Systems: Crawling, Indexing, and Ranking

Understanding how crawling, indexing, and ranking works is helpful to SEO practitioners, as it helps them determine what actions to take to meet their goals. This section primarily covers the way Google and Bing operate, and does not necessarily apply to other search engines that are popular, such as Yandex (Russia), Baidu (China), Seznam (Czechoslovakia), and Naver (Korea).

The search engines must execute multiple tasks very well to provide relevant search results. Put simplistically, you can think of these as:

  • Crawling and indexing billions of documents (pages and files) on the Web (note that they ignore pages that they consider to be “insignificant,” perhaps because they are perceived as adding no new value or are not referenced at all on the Web)

  • Responding to user queries by providing lists of relevant pages

In this section, we’ll walk through the basics of these functions from a nontechnical perspective. This section will start by discussing how search engines find and discover content.

Crawling and Indexing

To offer the best possible results, search engines must attempt to discover all the public pages on the World Wide Web and then present the ones that best match up with the user’s search query. The first step in this process is crawling the Web. The search engines start with a seed set of sites that are known to be very high quality sites, and then visit the links on each page of those sites to discover other web ...

Get The Art of SEO, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.