8 Approximate quantiles on data streams

This chapter covers

Reviewing the concept of exact quantiles and understanding constraints imposed by streaming data context
Understanding different types of errors for approximate quantiles
Applying t-digest and q-digest algorithms to a data stream
Comparing t-digest and q-digest on realistic data on length of visits to a website

Different algorithms presented in the previous chapter allow us to select an (un)biased sample from all data-tuples that have arrived up to the current moment. In a way, a sample is a very flexible datasketch: you form it once, and you can then use it to claim that its mean, or any other feature, is a good estimate of that same feature of all the data from the stream so far. ...

Get Algorithms and Data Structures for Massive Datasets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Algorithms and Data Structures for Massive Datasets by Emin Tahirovic, Dzejla Medjedovic, Ines Dedovic

8 Approximate quantiles on data streams

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly