11 Best practices

This chapter covers

  • Writing clean, understandable DAGs using style conventions
  • Using consistent approaches for managing credentials and configuration options
  • Generating repeated DAGs and tasks using factory functions
  • Designing reproducible tasks by enforcing idempotency and determinism constraints
  • Handling data efficiently by limiting the amount of data processed in your DAG
  • Using efficient approaches for handling/storing (intermediate) data sets
  • Managing managing concurrency using resource pools

In previous chapters, we have described most of the basic elements that go into building and designing data processes using Airflow DAGs. In this chapter, we dive a bit deeper into some best practices that can help you write well-architected ...

Get Data Pipelines with Apache Airflow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.