Chapter 4. SEARCHING THE WEB

"The internet has been the most fundamental change during my lifetime and for hundreds of years."

Rupert Murdoch, Media owner

Weintroduce web search by presenting the major search engines that are battling for our clicks. We look at some statistics derived from search engine log files, giving us insight into how users employ search engines to answer their queries. We describe the components of a search engine and how search engines make use of special crawling software to collect data from web pages and maintain a fresh index that covers as much of the Web as they possibly can.

CHAPTER OBJECTIVES

  • Detail the various aspects of a typical search session.

  • Raise the political issues that arise from search engines being the primary information gatekeepers of the Web.

  • Introduce the main competitors, Google, Yahoo, and Bing, involved in the ongoing search engine wars to dominate the web search space.

  • Present some search engine statistics generated from studies of query logs.

  • Explain the implications of using different query syntax, on search results.

  • Present the most popular search keywords found from studies of query logs.

  • Present a generic architecture for a search engine and discuss its various components.

  • Explain how the search index organizes the text found in web pages into an inverted-file data structure.

  • Explain how hyperlink information pertaining to URLs is stored in a link database.

  • Explain the roles of the query engine and how it interfaces between the search ...

Get An Introduction to Search Engines and Web Navigation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.