It is possible to know so much about a subject that you become totally ignorant.
Frank Herbert, Chapterhouse: Dune
In this chapter we’ll take the concept of metrics that we introduced in Chapter 15 and dive into the details: what kind of metrics are there, which ones are important for cloud native services, how do you choose which metrics to focus on, how do you analyze metrics data to get actionable information, and how do you turn raw metrics data into useful dashboards and alerts? Finally, we’ll outline some of the options for metrics tools and platforms.
Since a metrics-centered approach to observability is relatively new to the DevOps world, let’s take a moment to talk about exactly what metrics are, and how best to use them.
As we saw in “Introducing Metrics”, metrics are numerical measures of specific things. A familiar example from the world of traditional servers is the memory usage of a particular machine. If only 10% of physical memory is currently allocated to user processes, the machine has spare capacity. But if 90% of the memory is in use, the machine is probably pretty busy.
So one valuable kind of information that metrics can give us is a snapshot of what’s going on at a particular instant. But we can do more. Memory usage goes up and down all the time as workloads start and stop, but sometimes what we’re interested in is the change in memory usage over time.
If you sample memory ...