9

Orchestrating Your Data Workflows

We have covered a wealth of techniques and knowledge in building our data platforms. However, there are some missing components in fully orchestrating everything. We’ve mentioned Databricks Workflows, but we didn’t dive deep into how it works; we also haven’t mentioned logging or secrets management. Workflows is an orchestration tool that’s used to manage data pipelines in Databricks. Orchestration tools normally allow for common data tasks and provide the history of each pipeline run, which is specific to the pipeline. Having a central place to manage all your pipelines is a critical step to having reliable, scalable data pipelines. So, this chapter will discuss these topics in detail and create more stability ...

Get Modern Data Architectures with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.