7 Sampling from data streams

This chapter covers

  • Sampling from an infinite landmark stream
  • Incorporating recency by using a sliding window and how to sample from it
  • Showcasing the difference between a representative and biased sampling strategy on a landmark stream with a sudden shift
  • Exploring R and Python packages and libraries for writing and executing tasks on data streams

We are ready to fully appreciate sampling as a single task staged in the analysis tier. Although we have already shown that this division of the streaming data architecture is not so clear-cut, we will imagine the stream processor sampling the incoming stream in this tier. This will help to introduce the sampling algorithm without any additional complexity coming from ...

Get Algorithms and Data Structures for Massive Datasets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.