8 Approximate quantiles on data streams
This chapter covers
- Reviewing the concept of exact quantiles and understanding constraints imposed by streaming data context
- Understanding different types of errors for approximate quantiles
- Applying t-digest and q-digest algorithms to a data stream
- Comparing t-digest and q-digest on realistic data on length of visits to a website
Different algorithms presented in the previous chapter allow us to select an (un)biased sample from all data-tuples that have arrived up to the current moment. In a way, a sample is a very flexible datasketch: you form it once, and you can then use it to claim that its mean, or any other feature, is a good estimate of that same feature of all the data from the stream so far. ...
Get Algorithms and Data Structures for Massive Datasets now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.