O'Reilly logo

Handbook of Data Structures and Applications, 2nd Edition by Sartaj Sahni, Dinesh P. Mehta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

51

Data Structures in Web Information Retrieval*

Monika Henzinger

University of Vienna

51.1Introduction

51.2Inverted Indices

Index CompressionIndex Granularity

51.3Fingerprints

51.4Finding Near-Duplicate Documents

51.5Conclusions

References

51.1Introduction

Current search engines process thousands of queries per second over a collection of billions of web pages with a sub-second average response time. There are two reasons for this astonishing performance: Massive parallelism and a simple yet efficient data structure, called inverted index.

In this chapter we will describe inverted indices. The parallelism deployed by search engines is quite straightforward: Given a collection of documents and a user query the goal of information retrieval ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required