Chapter 2. Monitoring Conventions

Everyone seems to have a different definition for monitoring. Many folks know it as a conventional polling system like Nagios. For others, it might mean walking their networks with SNMP and Cacti, or perhaps even a bespoke collection of Perl scripts and artisanal cron jobs. Some companies don’t run internal monitoring systems at all, preferring to outsource all or part of their monitoring to hosted monitoring services. No matter which tools you cobble together or how you orchestrate them, most of these systems include a common set of operational responsibilities.

From my experiences, it’s important that we, the maintainers and users of these monitoring architectures, share a common vocabulary and understanding of the logical areas of functionality that make up these systems. Describing to your peers how you "instrument your application telemetry and aggregate the results (because they report irregularly) before firing them off to your trending system for correlation and fault detection" is almost certainly going to explain more about your setup than saying, “We monitor stuff.”

Don’t get me wrong, I’m all for brevity, but words really do matter. And for better or worse, we’ve got enough of them to choke a horse. Although those two descriptions may in theory be saying roughly the same thing, the former tells us a lot more about what you do and how it adds value to your organization. Above all else, being able to speak lucidly about your monitoring ...

Get Monitoring with Graphite now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.