Skip to Content
View all events

Airflow Development Best Practices

Published by Pearson

Intermediate content levelIntermediate

Start Quickly, Build Efficiently, Account for Dependencies, and Debug Pipelines

  • Learn Airflow basics, including setting up secrets for your pipelines and CI/CD workflows in GitHub
  • Build a basic pipeline with unit testing
  • Set up Slack messaging for errors, as well as quickly and efficiently debug pipelines

This course gets you up and running with your first basic Airflow pipelines, focusing on general database and Python connector usage. You dive into adding dependencies and secrets within GitHub, as well as understand how to set this up in AWS or GCP platforms. The final key piece focuses on how to create unit testing for pipelines and set up slack messaging in order to notify users of any errors. We run through some error handling examples and how to best approach debugging Airflow pipelines. Overall, the aim is to set users up with the general basics to get a well-rounded understanding of key ETL steps and needs within Airflow.

What you’ll learn and how you can apply it

By the end of the live online course, you’ll understand:

  • How to stand-up Airflow and how to use the provided, basic operators—i.e., databases vs. python
  • How to debug a pipeline and build out testing for pipelines, as well as simple slack messaging for error visibility
  • How to set up secrets and dependencies in your cicd pipelines for Airflow, including using GCP/AWS secrets and how to plug those in

And you’ll be able to:

  • Get your first pipeline up and running for general ETL
  • Set up secrets within GitHub to protect sensitive data
  • Build basic unit testing for ETL pipelines

This live event is for you because...

  • You are interested in using Airflow for your ETL processing
  • You want to level up your Airflow skills with unit testing and secret management
  • This course is good for beginner and intermediate users, where intermediate users are looking to build out their pipelines more robustly

Prerequisites

  • Basic understanding of SQL and Python
  • Basic understanding of ETL pipelines
  • Basic understanding of Github and CICD pipelines

Course Set-up

  • GitHub repo
  • Doc links for Airflow and course set-up listed in README in GitHub

Recommended Preparation

Recommended Follow-up

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Segment 1: Spinning Up Your First Airflow Pipeline (45 minutes)

  • Configuring Docker, MySQL, and Airflow locally
  • Starting a basic Airflow cluster
  • Creating basic database operator workflows—table load/insert/update (most likely mySQL)
  • Creating a basic python operator workflow

Give students time to push their own basic ETL pipeline using either database or python operators.

Break (10 minutes)

Q&A (5 minutes)

Segment 2: Working with Secrets in Github/AWS/GCP (45 minutes)

  • Setting up your CICD pipeline to pull secrets
  • Set up in AWS
  • Set up in GCP

Set up secrets in GitHub since that will be easily available.

Break (10 minutes)

Q&A (5 minutes)

Segment 3: Setting Up Error Messaging in Slack and Debugging Pipelines (45 minutes)

  • Creating a slack operator and plugging into a Slack instance
  • Where to start when debugging, depending on error
  • Creating a test within Airflow to prevent simple errors

Writing your first unit test for the pipeline.

Course wrap-up and next steps (20 minutes)

Q&A (10 minutes)

Your Instructor

  • Brittney Monroe

    Brittney Monroe has a deep history within the data realm, including experience as a data analyst, a database administrator, and as a data engineer over the last few years. She is incredibly passionate about data governance and validation solutions, and the best way to implement both in tandem within data systems. Brittney is deeply interested in narrowing the gap between data science and data engineering.

Skill covered

Apache Airflow