July 2024
Intermediate to advanced
296 pages
7h 4m
English
In this chapter, we will cover the deployment of key big data technologies – Spark, Airflow, and Kafka – on Kubernetes. As container orchestration and management have become critical for running data workloads efficiently, Kubernetes has emerged as the de facto standard. By the end of this chapter, you will be able to successfully deploy and manage big data stacks on Kubernetes for building robust data pipelines and applications.
We will start by deploying Apache Spark on Kubernetes using the Spark operator. You will learn how to configure and monitor Spark jobs running as Spark applications on your Kubernetes cluster. Being able to run Spark workloads on Kubernetes brings important benefits such ...
Read now
Unlock full access