O'Reilly logo

On the Efficient Determination of Most Near Neighbors, 2nd Edition by Mark S. Manasse

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

CHAPTER 7

Forks in the Road: Flajolet and Slightly Biased Sampling

As mentioned in the Forward (sic, as described on page xv), shortly after the initial version of this publication was placed in the hands of my publisher, I received a disconcerting preprint from Ping Li: he and his students had come up with a way to produce a sketch for unweighted sampling without replacement in time linear in the length of the document. Their paper proposed producing a sketch which might sometimes be underfull: in particular, for short documents, there might be (or must be, for very short documents) sample slots with no generated selection, particularly since it proposed sampling without replacement. Subsequent papers by Ping Li and other colleagues found unbiased ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required