O'Reilly logo

Mining the Web by Soumen Chakrabarti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

CHAPTER 3 WEB SEARCH AND INFORMATION RETRIEVAL

This chapter discusses how Web search engines work. Search engines have their roots in information retrieval (IR) systems, which prepare a keyword index for the given corpus and respond to keyword queries with a ranked list of documents. The query language provided by most search engines lets us look for Web pages that contain (or do not contain) specified words and phrases. Conjunctions and disjunctions of such clauses are also permitted. Mature IR technology predates the Web by at least a decade. One of the earliest applications of rudimentary IR systems to the Internet was Archie, which supported title search across sites serving files over the File Transfer Protocol (FTP). It was only in the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required