Chapter 4. Containing the Cost
“Show me the money!”
Jerry Maguire
Volume is the problem, but not just because it is hard to navigate and work with. Data, especially in the cloud, costs money. Sometimes lots of money. If a potential $65 million bill doesn’t scare you, then your organization is doing exceptionally well. For the rest of us, cost really matters.
Processors Are the Key
In Chapters 2 and 3, you got a glimpse of how telemetry pipelines can help with cost. Some key processors that can help you control cost are deduplicate, route, reduce, sample, filter, and conversion processors.
Deduplicate Where You Can
When it comes to cost, the deduplicate processor is your brutally simple friend. By applying some simple logic, the deduplicate processor can reduce your telemetry data streams magnificently and, importantly, without losing any data.
This is why it’s such a popular processor; it can reduce your data while not losing the fidelity of that data.
Choose Your Route Carefully
At the simplest end of the scale, you can merely choose where your telemetry data goes. If you want to optimize your spend on Splunk, you can ensure that only the data necessary for Splunk is routed to it. The remaining data could be routed to low-cost storage, such as S3, so that nothing is lost just in case. It’s that simple, sort of.
The art here is to ensure that you are still routing something useful to your destinations. A router might not give you the right level of intelligence to create ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access