Chapter 2. Emerging Architecture Patterns

New architecture patterns have emerged to address the myriad changes in the data landscape and the gap between new requirements and old architectures. In this chapter, we first cover three foundational patterns: event sourcing, stateful streaming, and declarative data pipelines. We then discuss how these can be combined with cloud object storage to upgrade the data lake into a database for modern data. We close the chapter with a discussion of the emerging data mesh paradigm and the related concept of data as a product.

Event Sourcing for Analytics Pipelines

CRUD (create, read, update, delete) is the dominant operating model for databases, but it has two key limitations as it pertains to at-scale, high-speed data. First, there is a lack of workload isolation, meaning that writes and reads compete for the same database state resource, limiting the ability to scale the system. Second, CRUD is stateless, so there is no way to maintain or audit data lineage; you lose track of earlier versions or the changes that led to the current version. Lack of state knowledge also means that you can’t “go back in time” to a previous state and reprocess data, a useful capability for backtesting a research hypothesis or retroactively changing pipeline logic to address a bug or new requirement.

Event sourcing (see Figure 2-1) is an alternative to CRUD that addresses these issues. Event sourcing is an architectural pattern in which all changes to the state ...

Get Unlock Complex and Streaming Data with Declarative Data Pipelines now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.