Chapter 3. Data Orchestration
Though we’ve already discussed ingestion (E, L) and transformation (T), we’ve only scratched the surface of ETL. Contrary to viewing data pipelines as a series of discrete steps, there exist overarching mechanisms that operate on a meta level, aptly dubbed “undercurrents” by Matt Housley and Joe Reis in Fundamentals of Data Engineering:
-
Security
-
Data management
-
Data operations (DataOps)
-
Data architecture
-
Data orchestration
-
Software engineering
In this chapter, we’ll explore dependency management and pipeline orchestration, touching on the history of orchestrators, which is important for understanding why certain methods of orchestration are popular today. We’ll present a menu of options for you to orchestrate your own data workflows and discuss some common design patterns in orchestration.
Throughout will be a discussion of how an “orchestrator” has historically been separate from a “transformation” tool. We’ll touch on why this has been true and why it might not be true in the future, though we still believe a separate orchestrator is the preferred approach.
What Is Data Orchestration?
Every workflow, data or not, requires sequential steps: attempting to use a French press without heating water will only brew disappointment, whereas poorly sequenced data transformations might brew a storm far more bitter than a caffeine-deprived morning (though the woes of the decaffeinated are not to be trivialized). In data, these “steps” are often ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access