Algorithms and Data Structures for Massive Datasets
by Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert
Part 2 Real-time analytics
Thus far, we haven’t been concerned with the state in which massive data arrives at our disposal. All the algorithms we have gotten to know so far can be applied to continuously arriving data as well as to historical data residing in a big database system. The three chapters in part 2 present algorithms and data structures (sketches) whose design considerations and application context were driven by the continuous arrival of data tuples referred to as data streams. Here, due to the transient nature of the data at hand, algorithms have to operate efficiently and incorporate knowledge about the stream after each tuple seen. We achieve this by keeping sketches of a data stream. Some of them, like random samples, are general ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access