Chapter 12

Streaming Algorithms for Big Data Processing on Multicore Architecture

Marat Zhanikeev

Abstract

This chapter brings together three topics: hash functions, Bloom filters, and the recently emerged streaming algorithms. Hashing is the oldest of the three and is backed by much literature. Bloom filters are based on hash functions and benefit from hashing efficiency directly. Streaming algorithms use both Bloom filters and hashing in various ways but impose strict requirements on performance. This chapter views the three topics from the viewpoint of efficiency and speed. The two main performance metrics are per-unit processing time and the size of the memory footprint. All algorithms are presented as C/C++ pseudocode. Specific attention ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.