CHAPTER 3 WEB SEARCH AND INFORMATION RETRIEVAL

This chapter discusses how Web search engines work. Search engines have their roots in information retrieval (IR) systems, which prepare a keyword index for the given corpus and respond to keyword queries with a ranked list of documents. The query language provided by most search engines lets us look for Web pages that contain (or do not contain) specified words and phrases. Conjunctions and disjunctions of such clauses are also permitted. Mature IR technology predates the Web by at least a decade. One of the earliest applications of rudimentary IR systems to the Internet was Archie, which supported title search across sites serving files over the File Transfer Protocol (FTP). It was only in the ...

Get Mining the Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.