Chapter 1. Introduction
Until recently, big data systems have been batch oriented, where data is captured in distributed filesystems or databases and then processed in batches or studied interactively, as in data warehousing scenarios. Now, exclusive reliance on batch-mode processing, where data arrives without immediate extraction of valuable information, is a competitive disadvantage.
Hence, big data systems are evolving to be more stream oriented, where data is processed as it arrives, leading to so-called fast data systems that ingest and process continuous, potentially infinite data streams.
Ideally, such systems still support batch-mode and interactive processing, because traditional uses, such as data warehousing, haven’t gone away. In many cases, we can rework batch-mode analytics to use the same streaming infrastructure, where the streams are finite instead of infinite.
In this report I’ll begin with a quick review of the history of big data and batch processing, then discuss how the changing landscape has fueled the emergence of stream-oriented fast data architectures. Next, I’ll discuss hallmarks of these architectures and some specific tools available now, focusing on open source options. I’ll finish with a look at an example IoT (Internet of Things) application.
A Brief History of Big Data
The emergence of the Internet in the mid-1990s induced the creation of data sets of unprecedented size. Existing tools were neither scalable enough for these data sets nor cost ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access