Skip to Content
Data Pipelines with Apache Airflow
book

Data Pipelines with Apache Airflow

by Julian de Ruiter, Bas Harenslak
May 2021
Beginner to intermediate
480 pages
12h 59m
English
Manning Publications
Content preview from Data Pipelines with Apache Airflow

11 Best practices

This chapter covers

  • Writing clean, understandable DAGs using style conventions
  • Using consistent approaches for managing credentials and configuration options
  • Generating repeated DAGs and tasks using factory functions
  • Designing reproducible tasks by enforcing idempotency and determinism constraints
  • Handling data efficiently by limiting the amount of data processed in your DAG
  • Using efficient approaches for handling/storing (intermediate) data sets
  • Managing managing concurrency using resource pools

In previous chapters, we have described most of the basic elements that go into building and designing data processes using Airflow DAGs. In this chapter, we dive a bit deeper into some best practices that can help you write well-architected ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Pipelines with Apache Airflow

Data Pipelines with Apache Airflow

Julian de Ruiter, Bas Harenslak
Kubernetes: Up and Running, 3rd Edition

Kubernetes: Up and Running, 3rd Edition

Brendan Burns, Joe Beda, Kelsey Hightower, Lachlan Evenson

Publisher Resources

ISBN: 9781617296901Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentErrata PagePurchase Link