Chapter 13. Efficient Data Storage with Retriever
In this chapter, we’ll look at the challenges that must be addressed to effectively store and retrieve your observability data when you need it most. Speed is a common concern with data storage and retrieval, but other functional constraints impose key challenges that must be addressed at the data layer. At scale, the challenges inherent to observability become especially pronounced.
We will lay out the functional requirements necessary to enable observability workflows. Then we will examine real-life trade-offs and possible solutions in columnar data storage engines by practical example, using the implementation of Honeycomb’s proprietary Retriever datastore. In the next chapter, you’ll also learn about an alternative application of these concepts using ClickHouse, the widely deployed open source columnar storage engine.
Across both chapters, you will learn about the various considerations required at the storage and retrieval layers to ensure speed, scalability, and durability for your observability data. You will learn about how columnar datastores are architected, why they are particularly well-suited for observability data, how querying workloads must be handled, and considerations for making data storage durable and performant.
When we started working on modern observability systems, many believed storing high-cardinality event data at scale was infeasible without pre-aggregating it into metrics. Today, multiple proven event-based ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access