10 Building a Big Data Pipeline on Kubernetes

In the previous chapters, we covered the individual components required for building big data pipelines on Kubernetes. We explored tools such as Kafka, Spark, Airflow, Trino, and more. However, in the real world, these tools don’t operate in isolation. They need to be integrated and orchestrated to form complete data pipelines that can handle various data processing requirements.

In this chapter, we will bring together all the knowledge and skills you have acquired so far and put them into practice by building two complete data pipelines: a batch processing pipeline and a real-time pipeline. By the end of this chapter, you will be able to (1) deploy and orchestrate all the necessary tools for building ...

Get Big Data on Kubernetes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Big Data on Kubernetes by Neylson Crepalde

10

Building a Big Data Pipeline on Kubernetes

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly