Chapter 50. Metrics Are Not SLIs (The Measure Everything Trap)
Brian Murphy
“Measure everything” is a trap.
It is throwaway advice passed down over time—and no one can rightly recall why. It’s the project saddled to the summer intern when immature organizations run out of useful work to offer them. We spend hours augmenting code for those what-if situations and ultimately end up spamming the metric search space with useless data points. Do not measure everything.
Back when memory was expensive, you had to be picky about what metrics to store. You had to focus on the most important ones for your service. As the cost of memory decreased, it became increasingly cheap to store increasingly more metrics—with a justification of providing value someday in the (far distant) future. For most of us, that future date never arrived.
The question then becomes, “What is worth measuring?” Focus on metrics that can build quality SLIs. First, let’s describe the difference between metrics and SLIs. Metrics are raw numbers: how many items in a queue, how many days since the last failure, how many items in a shopping cart. SLIs are combinations of metrics that tell a story: if the queue keeps filling at the current rate, how much time is left before the system performance begins to degrade or completely falls over? Metrics provide evidence that the system simply works. SLIs provide evidence ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access