Data Pipelines with Apache Airflow, video edition

Video description

In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.

An Airflow bible. Useful for all kinds of users, from novice to expert.
Rambabu Posa, Sai Aashika Consultancy

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.

about the technology

Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task.

about the book

Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs.

what's inside

  • Build, test, and deploy Airflow pipelines as DAGs
  • Automate moving and transforming data
  • Analyze historical datasets using backfilling
  • Develop custom components
  • Set up Airflow in production environments

about the audience

For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills.

about the author

Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer.

An easy-to-follow exploration of the benefits of orchestrating your data pipeline jobs with Airflow.
Daniel Lamblin, Coupang

The one reference you need to create, author, schedule, and monitor workflows with Apache Airflow. Clear recommendation.
Thorsten Weber, bbv Software Services AG

By far the best resource for Airflow.
Jonathan Wood, LexisNexis

NARRATED BY JULIE BRIERLEY

Table of contents

  1. Part 1. Getting started
  2. Chapter 1 Meet Apache Airflow
  3. Chapter 1 Pipeline graphs vs. sequential scripts
  4. Chapter 1 Introducing Airflow
  5. Chapter 1 When to use Airflow
  6. Chapter 2 Anatomy of an Airflow DAG
  7. Chapter 2 Running a DAG in Airflow
  8. Chapter 2 Running at regular intervals
  9. Chapter 3 Scheduling in Airflow
  10. Chapter 3 Cron-based intervals
  11. Chapter 3 Processing data incrementally
  12. Chapter 3 Understanding Airflow’s execution dates
  13. Chapter 3 Best practices for designing tasks
  14. Chapter 4 Templating tasks using the Airflow context
  15. Chapter 4 Templating the PythonOperator
  16. Chapter 4 Hooking up other systems
  17. Chapter 5 Defining dependencies between tasks
  18. Chapter 5 Branching
  19. Chapter 5 Conditional tasks
  20. Chapter 5 More about trigger rules
  21. Chapter 5 Sharing data between tasks
  22. Chapter 5 Chaining Python tasks with the Taskflow API
  23. Part 2. Beyond the basics
  24. Chapter 6 Triggering workflows
  25. Chapter 6 Polling custom conditions
  26. Chapter 6 Triggering other DAGs
  27. Chapter 7 Communicating with external systems
  28. Chapter 7 Developing locally with external systems
  29. Chapter 7 Moving data from between systems
  30. Chapter 8 Building custom components
  31. Chapter 8 Building a custom hook
  32. Chapter 8 Building a custom operator
  33. Chapter 8 Packaging your components
  34. Chapter 9 Testing
  35. Chapter 9 Setting up a CI/CD pipeline
  36. Chapter 9 Testing with files on disk
  37. Chapter 9 Working with external systems
  38. Chapter 9 Using tests for development
  39. Chapter 10 Running tasks in containers
  40. Chapter 10 Introducing containers
  41. Chapter 10 Containers and Airflow
  42. Chapter 10 Creating container images for tasks
  43. Chapter 10 Running tasks in Kubernetes
  44. Chapter 10 Using the KubernetesPodOperator
  45. Part 3. Airflow in practice
  46. Chapter 11 Best practices
  47. Chapter 11 Manage credentials centrally
  48. Chapter 11 Use factories to generate common patterns
  49. Chapter 11 Designing reproducible tasks
  50. Chapter 11 Handling data efficiently
  51. Chapter 11 Managing your resources
  52. Chapter 12 Operating Airflow in production
  53. Chapter 12 Which executor is right for me?
  54. Chapter 12 A closer look at the scheduler
  55. Chapter 12 Installing each executor
  56. Chapter 12 Setting up the KubernetesExecutor
  57. Chapter 12 Capturing logs of all Airflow processes
  58. Chapter 12 Visualizing and monitoring Airflow metrics
  59. Chapter 12 Creating dashboards with Grafana
  60. Chapter 12 How to get notified of a failing task
  61. Chapter 12 Scalability and performance
  62. Chapter 13 Securing Airflow
  63. Chapter 13 Encrypting data at rest
  64. Chapter 13 Encrypting traffic to the webserver
  65. Chapter 13 Fetching credentials from secret management systems
  66. Chapter 14 Project: Finding the fastest way to get around NYC
  67. Chapter 14 Extracting the data
  68. Chapter 14 Structuring a data pipeline
  69. Part 4. In the clouds
  70. Chapter 15 Airflow in the clouds
  71. Chapter 15 Google Cloud Composer
  72. Chapter 16 Airflow on AWS
  73. Chapter 16 AWS-specific hooks and operators
  74. Chapter 16 Building the DAG
  75. Chapter 17 Airflow on Azure
  76. Chapter 17 Overview
  77. Chapter 18 Airflow in GCP
  78. Chapter 18 Integrating with Google services
  79. Chapter 18 GCP-specific hooks and operators
  80. Chapter 18 Getting data into BigQuery

Product information

  • Title: Data Pipelines with Apache Airflow, video edition
  • Author(s): Julian de Ruiter, Bas Harenslak
  • Release date: May 2021
  • Publisher(s): Manning Publications
  • ISBN: None