Chapter 15. Compute and Software Monitoring in Practice
Supporting a service for a long time attunes you to operational cues that warn of system problems. You can quickly glean helpful information from event logs. But someone new to the team doesnât have the benefit of time and experience with your systems, so they wonât be able to get useful information from trawling through the same event logs and metrics. Moreover, if the job requires distilling all the nuance about the system from logs and metrics alone, there is inadequate monitoring and documentation.
If you manage a wide range of systems, the questions you must answer are: what can you monitor, and what has business value? Your environment and business goals are unique, so your answers to these questions may not look like anyone elseâs. For this reason, I will not prescribe a specific monitoring strategy in this chapter or tell you to monitor four metrics to complete your monitoring setup.
Instead, in this chapter, I will help you discover what monitors matter to you and offer methods for evaluating different tools and frameworks to help you imagine how to use them. Monitoring outputs must tie directly to your business value and encourage team resilience.
Identify Your Desired Outputs
When planning a monitoring strategy, many start with âWhat should I monitor?â Instead, I propose that the first question should be âWhat do I need now?â or âWhat is causing problems with the way my team works?â
At the top ...
Get Modern System Administration now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.