Kubeflow for Machine Learning
by Trevor Grant, Holden Karau, Boris Lublinsky, Richard Liu, Ilan Filonenko
Chapter 4. Kubeflow Pipelines
In the previous chapter we described Kubeflow Pipelines, the component of Kubeflow that orchestrates machine learning applications. Orchestration is necessary because a typical machine learning implementation uses a combination of tools to prepare data, train the model, evaluate performance, and deploy. By formalizing the steps and their sequencing in code, pipelines allow users to formally capture all of the data processing steps, ensuring their reproducibility and auditability, and training and deployment steps.
We will start this chapter by taking a look at the Pipelines UI and showing how to start writing simple pipelines in Python. We’ll explore how to transfer data between stages, then continue by getting into ways of leveraging existing applications as part of a pipeline. We will also look at the underlying workflow engine—Argo Workflows, a standard Kubernetes pipeline tool—that Kubeflow uses to run pipelines. Understanding the basics of Argo Workflows allows you to gain a deeper understanding of Kubeflow Pipelines and will aid in debugging. We will then show what Kubeflow Pipelines adds to Argo.
We’ll wrap up Kubeflow Pipelines by showing how to implement conditional execution in pipelines and how to run pipelines execution on schedule. Task-specific components of pipelines will be covered in their respective chapters.
Getting Started with Pipelines
The Kubeflow Pipelines platform consists of:
-
A UI for managing and tracking pipelines and ...