11

Automating Your Data Ingestion Pipelines

Data sources are frequently updated, and this requires us to update our data lake. However, with multiple sources or projects, it becomes impossible to trigger data pipelines manually. Data pipeline automation makes ingesting and processing data mechanical, obviating the human actions to trigger it. The importance of automation configuration lies in the ability to streamline data flow and improve data quality, reducing errors and inconsistency.

In this chapter, we will cover how to automate the data ingestion pipelines in Airflow, along with two essential topics in data engineering, data replication and historical data ingestion, as well as best practices.

In this chapter, we will cover the following ...

Get Data Ingestion with Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.