10

Data Pipeline Orchestration

Once you have defined the business logic and transformations on your data, you need a reliable way to stitch them all together. If there is a failure, you should be notified and be able to easily identify the tasks that failed before you analyze them. This is where data pipeline orchestration comes in. It refers to the coordination and management of tasks in data transformation through well-defined dependencies between them. There are many business reasons for orchestration, but consider the following simple example. You need a report delivered daily and you need to process the data for that report each day. This requires orchestration.

In this chapter, we are going to look at some of the most common tools and ...

Get Data Engineering with Scala and Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.