Chapter 8. Observability

You need visibility into what’s going on across the stack—from the kernel to user-facing parts. Often, you get that visibility by knowing the right tool for the task.

This chapter is all about gathering and using different signals that Linux and its applications generate so that you can make informed decisions. For example, you’ll see how you can do the following:

  • Figure out how much memory a process consumes

  • Understand how soon you will run out of disk space

  • Get an alert on custom events for security reasons

To establish a common vocabulary, we’ll first review different signal types you might come across, such as system or application logs, metrics, and process traces. We’ll also have a look at how to go about troubleshooting and measuring performance. Next, we’ll focus on logs specifically, reviewing different options and semantics. Then, we’ll cover monitoring for different resource types, such as CPU cycles, memory, or I/O traffic. We’ll review different tools that you can use and show certain end-to-end setup you may wish to adopt.

You’ll learn that observability is often reactionary. That is, something crashes or runs slowly, and you start looking at processes and their CPU or memory usage, or dig into the logs. But there are also times when observability has more of an investigative nature—for example, when you want to figure out how long certain algorithms take. Last but not least, you can use predictive (rather than reactive) observability. ...

Get Learning Modern Linux now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.