Preface
For data engineers and data scientists, there’s never a shortage of technologies that are competing for our attention. Whether we’re perusing our favorite subreddits, scanning Hacker News, reading tech blogs, or weaving through hundreds of tables at a tech conference, there are so many things to look at that it can start to feel overwhelming.
But if we can find a quiet corner to just think for a minute, and let all of the buzz fade into the background, we can start to distinguish patterns from the noise. You see, we live in the age of explosive data growth, and many of these technologies were created to help us store and process data at scale. We’re told that these are modern solutions for modern problems, and we sit around discussing “big data” as if the idea is avant-garde, when really the focus on data volume is only half the story.
Technologies that only solve for the data volume problem tend to have batch-oriented techniques for processing data. This involves running a job on some pile of data that has accumulated for a period of time. In some ways, this is like trying to drink the ocean all at once. With modern computing power and paradigms, some technologies actually manage to achieve this, though usually at the expense of high latency.
Instead, there’s another property of modern data that we focus on in this book: data moves over networks in steady and never-ending streams. The technologies we cover in this book, Kafka Streams and ksqlDB, are specifically designed ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access