Chapter 2. Stream-First Architecture

There is a revolution underway in how people design their data architecture, not just for real-time or near real–time projects, but in a larger sense as well. The change is to think of stream-based data flow as the heart of the overall design, rather than the basis just for specialized work. Understanding the motivations for this transformation to a stream-first architecture helps put Apache Flink and its role in modern data processing into context.

Flink, as part of a newer breed of systems, does its part to broaden the scope of the term “data streaming” way beyond real-time, low-latency analytics to encompass a wide variety of data applications, including what is now covered by stream processors, what is covered by batch processors, and even some stateful applications that are executed by transactional databases.

As it turns out, the data architecture needed to put Flink to work effectively is also the basis for gaining broader advantages from working with streaming data. To understand how this works, we will take a closer look at how to build the pipeline to support Flink for stream processing. But first, let’s address the question of what is to be gained from working with a stream-focused architecture instead of the more traditional approach.

Traditional Architecture versus Streaming Architecture

Traditionally, the typical architecture of a data backend has employed a centralized database system to hold the transactional data of the business. ...

Get Introduction to Apache Flink now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.