Skip to Content
Data Pipelines with Apache Airflow
book

Data Pipelines with Apache Airflow

by Julian de Ruiter, Bas Harenslak
May 2021
Beginner to intermediate
480 pages
12h 59m
English
Manning Publications
Content preview from Data Pipelines with Apache Airflow

3 Scheduling in Airflow

This chapter covers

  • Running DAGs at regular intervals
  • Constructing dynamic DAGs to process data incrementally
  • Loading and reprocessing past data sets using backfilling
  • Applying best practices for reliable tasks

In the previous chapter, we explored Airflow’s UI and showed you how to define a basic Airflow DAG and run it every day by defining a scheduled interval. In this chapter, we will dive a bit deeper into the concept of scheduling in Airflow and explore how this allows you to process data incrementally at regular intervals. First, we’ll introduce a small use case focused on analyzing user events from our website and explore how we can build a DAG to analyze these events at regular intervals. Next, we’ll explore ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Pipelines with Apache Airflow

Data Pipelines with Apache Airflow

Julian de Ruiter, Bas Harenslak
Kubernetes: Up and Running, 3rd Edition

Kubernetes: Up and Running, 3rd Edition

Brendan Burns, Joe Beda, Kelsey Hightower, Lachlan Evenson

Publisher Resources

ISBN: 9781617296901Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentErrata PagePurchase Link