5 Cardinality estimation and HyperLogLog
This chapter covers
- Practical use cases where space-efficient cardinality estimation algorithms are used
- Teaching the incremental development of ideas leading up to and including HyperLogLog, such as probabilistic counting and LogLog
- How HyperLogLog works, its space and error requirements, and where it is used
- How different cardinality estimates behave on large data using a simulation via an experiment
- Insights into practical implementations of HyperLogLog
Determining the cardinality of a multiset (a set with duplicates) is a common problem cropping up in all areas of software development, and especially in applications involving databases, network traffic, and so on. However, since the expansion of ...