Chapter 18. Orchestrating Machine Learning Pipelines

In Chapter 1, we introduced ML pipelines and why we need them to produce reproducible and repeatable ML models. In the chapters that followed, we took a deep dive into the individual aspects of ML pipelines, ranging from data ingestion, data validation, model training, and model evaluation, all the way to model deployments. Now it’s time to close the loop and focus on how to assemble the individual components into production pipelines.

All the components of an ML pipeline described in the previous chapters need to be executed in a coordinated way or, as we say, orchestrated. Inputs to a component must be computed before a given component is executed. The orchestration of these steps is performed by orchestration tools such as Apache Beam or Kubeflow Pipelines, or on Google Cloud’s Vertex Pipelines.

In this chapter, we focus on orchestration of the ML components, introducing different orchestration tools and how to pick the best tool for your project.

An Introduction to Pipeline Orchestration

Pipeline orchestration is the “glue” between your pipeline components, such as data ingestion, preprocessing, model training, and model evaluation. Before diving into the details on the different orchestration options, let’s review why we need pipeline orchestration in the first place and introduce the concept of directed acyclic graphs.

Why Pipeline Orchestration?

Pipeline orchestration connects the pipeline components and ensures that ...

Get Machine Learning Production Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Machine Learning Production Systems by Robert Crowe, Hannes Hapke, Emily Caveness, Di Zhu

Chapter 18. Orchestrating Machine Learning Pipelines

An Introduction to Pipeline Orchestration

Why Pipeline Orchestration?

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly