Video description
In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.
An Airflow bible. Useful for all kinds of users, from novice to expert.
Rambabu Posa, Sai Aashika Consultancy
A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.
about the technology
Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task.
about the book
Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs.
what's inside
- Build, test, and deploy Airflow pipelines as DAGs
- Automate moving and transforming data
- Analyze historical datasets using backfilling
- Develop custom components
- Set up Airflow in production environments
about the audience
For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills.
about the author
Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer.
An easy-to-follow exploration of the benefits of orchestrating your data pipeline jobs with Airflow.Daniel Lamblin, Coupang
The one reference you need to create, author, schedule, and monitor workflows with Apache Airflow. Clear recommendation.
Thorsten Weber, bbv Software Services AG
By far the best resource for Airflow.
Jonathan Wood, LexisNexis
NARRATED BY JULIE BRIERLEY
Table of contents
- Part 1. Getting started
- Chapter 1 Meet Apache Airflow
- Chapter 1 Pipeline graphs vs. sequential scripts
- Chapter 1 Introducing Airflow
- Chapter 1 When to use Airflow
- Chapter 2 Anatomy of an Airflow DAG
- Chapter 2 Running a DAG in Airflow
- Chapter 2 Running at regular intervals
- Chapter 3 Scheduling in Airflow
- Chapter 3 Cron-based intervals
- Chapter 3 Processing data incrementally
- Chapter 3 Understanding Airflow’s execution dates
- Chapter 3 Best practices for designing tasks
- Chapter 4 Templating tasks using the Airflow context
- Chapter 4 Templating the PythonOperator
- Chapter 4 Hooking up other systems
- Chapter 5 Defining dependencies between tasks
- Chapter 5 Branching
- Chapter 5 Conditional tasks
- Chapter 5 More about trigger rules
- Chapter 5 Sharing data between tasks
- Chapter 5 Chaining Python tasks with the Taskflow API
- Part 2. Beyond the basics
- Chapter 6 Triggering workflows
- Chapter 6 Polling custom conditions
- Chapter 6 Triggering other DAGs
- Chapter 7 Communicating with external systems
- Chapter 7 Developing locally with external systems
- Chapter 7 Moving data from between systems
- Chapter 8 Building custom components
- Chapter 8 Building a custom hook
- Chapter 8 Building a custom operator
- Chapter 8 Packaging your components
- Chapter 9 Testing
- Chapter 9 Setting up a CI/CD pipeline
- Chapter 9 Testing with files on disk
- Chapter 9 Working with external systems
- Chapter 9 Using tests for development
- Chapter 10 Running tasks in containers
- Chapter 10 Introducing containers
- Chapter 10 Containers and Airflow
- Chapter 10 Creating container images for tasks
- Chapter 10 Running tasks in Kubernetes
- Chapter 10 Using the KubernetesPodOperator
- Part 3. Airflow in practice
- Chapter 11 Best practices
- Chapter 11 Manage credentials centrally
- Chapter 11 Use factories to generate common patterns
- Chapter 11 Designing reproducible tasks
- Chapter 11 Handling data efficiently
- Chapter 11 Managing your resources
- Chapter 12 Operating Airflow in production
- Chapter 12 Which executor is right for me?
- Chapter 12 A closer look at the scheduler
- Chapter 12 Installing each executor
- Chapter 12 Setting up the KubernetesExecutor
- Chapter 12 Capturing logs of all Airflow processes
- Chapter 12 Visualizing and monitoring Airflow metrics
- Chapter 12 Creating dashboards with Grafana
- Chapter 12 How to get notified of a failing task
- Chapter 12 Scalability and performance
- Chapter 13 Securing Airflow
- Chapter 13 Encrypting data at rest
- Chapter 13 Encrypting traffic to the webserver
- Chapter 13 Fetching credentials from secret management systems
- Chapter 14 Project: Finding the fastest way to get around NYC
- Chapter 14 Extracting the data
- Chapter 14 Structuring a data pipeline
- Part 4. In the clouds
- Chapter 15 Airflow in the clouds
- Chapter 15 Google Cloud Composer
- Chapter 16 Airflow on AWS
- Chapter 16 AWS-specific hooks and operators
- Chapter 16 Building the DAG
- Chapter 17 Airflow on Azure
- Chapter 17 Overview
- Chapter 18 Airflow in GCP
- Chapter 18 Integrating with Google services
- Chapter 18 GCP-specific hooks and operators
- Chapter 18 Getting data into BigQuery
Product information
- Title: Data Pipelines with Apache Airflow, video edition
- Author(s):
- Release date: May 2021
- Publisher(s): Manning Publications
- ISBN: None
You might also like
audiobook
Data Pipelines with Apache Airflow
An Airflow bible. Useful for all kinds of users, from novice to expert. Rambabu Posa, Sai …
book
Data Pipelines with Apache Airflow
A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along …
book
Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications
Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how …
book
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
Understand the complexities of modern-day data engineering platforms and explore strategies to deal with them with …