Chapter 8. Zero-ETL or Near-Zero-ETL

In Chapter 7, we introduced emerging hybrid databases that provide alternative solutions to supporting real-time analytics. These systems reduce infrastructure and make data more accessible to analytical workloads. Since hybrid systems converge systems that are traditionally distributed, there is a supposition that hybrid systems lean toward a monolithic system. Monolithic systems are usually known for lacking modularity and scalability when performing data workloads.

Ironically, breaking up a monolithic data system will return us to decomposing a database and turning it inside out to scale the individual components specifically. This isn’t necessarily a negative solution. In this book, we have been proposing putting these systems back into the database to reduce complexity and cost, which are traditionally associated with large distributed systems.

ETL is how we move data around from system to system, transforming it along the way. So far, we have used a form of ETL called streaming SQL. In this chapter, we will talk about how to balance complexity and scalability in the implementation of ETL by taking a look at existing systems and patterns used today to distribute and scale data workloads.

ETL Model

Figure 8-1 shows existing ETL solutions from no ETL in HTAP databases at the top to the turn-the-database-inside-out distributed solution at the bottom. The lower the solution is on the triangle, the more distributed and complex it becomes. ...

Get Streaming Databases now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.