Chapter 5. Introduction to Streaming Databases

In a spreadsheet, you can put a formula in one cell (for example, the sum of cells in another column), and whenever any input to the formula changes, the result of the formula is automatically recalculated. This is exactly what we want at a data system level: when a record in a database changes, we want any index for that record to be automatically updated, and any cached views or aggregations that depend on the record to be automatically refreshed. You should not have to worry about the technical details of how this refresh happens, but be able to simply trust that it works correctly.

Martin Kleppmann, Designing Data-Intensive Applications

In the previous chapter, we learned how to “turn the database inside out,” as Martin Kleppmann has so aptly coined it. This involved externalizing the WAL of a database into input change streams, creating materialized views on top of them, and writing the processed data back into output change streams. Unlike materialized views in classic databases, such as Oracle or Postgres, where the refresh intervals range from a few minutes to a few hours, materialized views in stream processing platforms like Flink, Kafka Streams, ksqlDB, or Samza could be refreshed continuously—with every new change coming in.

The idea of “turning the database inside out” empowered us to build materialized views offering fresher data than ever before. However, compared to a simple classic database installation, it also ...

Get Streaming Databases now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Streaming Databases by Hubert Dulay, Ralph Matthias Debusmann

Chapter 5. Introduction to Streaming Databases

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly