Chapter 12. Pipelines Part 2: Kubeflow Pipelines
In Chapter 11, we discussed the orchestration of our pipelines with Apache Beam and Apache Airflow. These two orchestration tools have some great benefits: Apache Beam is simple to set up, and Apache Airflow is widely adopted for other ETL tasks.
In this chapter, we want to discuss the orchestration of our pipelines with Kubeflow Pipelines. Kubeflow Pipelines allows us to run machine learning tasks within Kubernetes clusters, which provides a highly scalable pipeline solution. As we discussed in Chapter 11 and show in Figure 12-1, our orchestration tool takes care of the coordination between the pipeline components.
Figure 12-1. Pipeline orchestrators
The setup of Kubeflow Pipelines is more complex than the installation of Apache Airflow or Apache Beam. But, as we will discuss later in this chapter, it provides great features, including Pipeline Lineage Browser, TensorBoard Integration, and the ability to view TFDV and TFMA visualizations. Furthermore, it leverages the advantages of Kubernetes, such as autoscaling of computation pods, persistent volume, resource requests, and limits, to name just a few.
This chapter is split into two parts. In the first part, we will discuss how to set up and execute pipelines with Kubeflow Pipelines. The demonstrated setup is independent from the execution environment. It can be a cloud provider ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access