Chapter 3. Data Orchestration

Though we’ve already discussed ingestion (E, L) and transformation (T), we’ve only scratched the surface of ETL. Contrary to viewing data pipelines as a series of discrete steps, there exist overarching mechanisms that operate on a meta level, aptly dubbed “undercurrents” by Matt Housley and Joe Reis in Fundamentals of Data Engineering:

  • Security

  • Data management

  • Data operations (DataOps)

  • Data architecture

  • Data orchestration

  • Software engineering

In this chapter, we’ll explore dependency management and pipeline orchestration, touching on the history of orchestrators, which is important for understanding why certain methods of orchestration are popular today. We’ll present a menu of options for you to orchestrate your own data workflows and discuss some common design patterns in orchestration.

Throughout will be a discussion of how an “orchestrator” has historically been separate from a “transformation” tool. We’ll touch on why this has been true and why it might not be true in the future, though we still believe a separate orchestrator is the preferred approach.

What Is Data Orchestration?

Every workflow, data or not, requires sequential steps: attempting to use a French press without heating water will only brew disappointment, whereas poorly sequenced data transformations might brew a storm far more bitter than a caffeine-deprived morning (though the woes of the decaffeinated are not to be trivialized). In data, these “steps” are often ...

Get Understanding ETL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.