Chapter 9. Tiered Storage

Pulsar has immutable storage as a primitive and interfaces for interacting with that storage via client libraries, Pulsar Functions, and Pulsar IO. This means we can do things like replay topic data from the beginning (or at a specific offset) as well as handle events like consumers going offline for periods of time. Supporting high-performance writing and retrieval of data in Pulsar requires the bookies in a Pulsar cluster to utilize disk technology, which is expensive relative to alternatives. In Chapter 5, you learned how Pulsar operators can ensure that data is deleted when it’s no longer used. What if we want to keep the data indefinitely but store it in a more cost-effective way?

Pulsar tiered storage is a mechanism for offloading data that doesn’t have immediate value on BookKeeper to a cheaper and more flexible storage solution (see Figure 9-1).

In Pulsar’s tiered storage ecosystem, data moves from bookies to other services like object storage or distributed file storage for long-term data storage needs.
Figure 9-1. In Pulsar’s tiered storage ecosystem, data moves from bookies to other services like object storage or distributed file storage for long-term data storage needs.

In this chapter we’ll cover some of the motivation for tiered storage and discuss how to set up tiered storage using Pulsar Admin APIs and the console.

Storing Data in the Cloud

In the world of data storage, not all mechanisms of storing data are considered equal. Each approach to storing data is a trade-off between cost, efficiency, ...

Get Mastering Apache Pulsar now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.