Chapter 8. Streaming Data on Kubernetes

When you think about data infrastructure, persistence is the first thing that comes to mind for many—storing the state of running applications. Accordingly, our focus up to this point has been on databases and storage. It’s now time to consider the other aspects of the cloud native data stack.

For those of you managing data pipelines, streaming may be your starting point, with other parts of your data infrastructure being of secondary concern. Regardless of your starting place, data movement is a vitally important part of the overall data stack. In this chapter, we’ll examine how to use streaming technologies in Kubernetes to share data securely and reliably in your cloud native applications.

Introduction to Streaming

In Chapter 1, we defined streaming as the function of moving data from one point to another and, in some cases, processing data in transit. The history of streaming is almost as long as that of persistence. As data was pooling in various isolated stores, it became evident that moving data reliably was just as important as storing data reliably. In those days, it was called messaging. Data was transferred slowly but deliberately, which resembled something closer to postal mail. Messaging infrastructure put data in a place where it could be read asynchronously, in order, with delivery guarantees. This met a critical need when using more than one computer and is one of the foundations of distributed computing.

Modern application ...

Get Managing Cloud Native Data on Kubernetes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.