Chapter 6. Consistency

If you’re familiar with databases, you take consistency for granted. You know that the results of your queries are going to be consistent with the input data. Now imagine you dare to cross the bridge from the database to the streaming world. Can you bank on similar consistency guarantees here, even with the additional complexity of data arriving late and out of order, as well as the emphasis on low latency and high throughput?

For classical stream processors, the answer is no. They guarantee a weaker form of consistency called eventual consistency. For classical stream processing use cases, often involving aggregations on windowed data, eventual consistency is a perfect fit, and it also enables data pipelines with ultra-low latency, with very high throughput, and at extremely large scale. The problem is, if you come from the database world, eventual consistency can turn out to be a confusing and counterintuitive experience—especially in combination with nonwindowed data.

In this chapter, we will use a toy example from the banking domain to demonstrate what can go wrong in eventually consistent stream processors like Flink, ksqlDB, and Proton if you just follow your intuitions from the database world.

Interestingly, some more recent stream processing systems support a stronger form of consistency, where every output is the correct output for a subset of the inputs: internal consistency.1 Of these stream processing systems, we put RisingWave, Materialize, ...

Get Streaming Databases now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.