O'Reilly logo

On the Efficient Determination of Most Near Neighbors, 2nd Edition by Mark S. Manasse

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

CHAPTER 4

Uniform Sampling after Alta Vista

In the 15 years since we worked on Alta Vista (while applying these techniques to Bing, and subsequently), we have discovered a few ways to compute consistent random samples from streams, typically using fewer random bits. In this chapter, we consider some of these approaches, delivering improvements from a factor of four to a factor of roughly 20, as the sections of this chapter will present. Such improvements are valuable, since the indexed size of the web has grown from tens of millions of pages to tens of billions in the intervening years.

4.1USING LESS RANDOMNESS TO IMPROVE SAMPLING EFFICIENCY

Restricting our attention to finite pools of features, as in the code samples above, we are no longer ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required