1 Introduction

Let’s begin with an origin story for a company called Example.com. Once upon a time(-series), Example.com had a sysadmin. She managed infrastructure that lived in data centers. Every time a new host was added to that environment she installed a monitoring agent and set up some monitoring checks. Every now and again one of those hosts would break and a check would trigger. A notification would be sent, and she would wake up and run rm -fr /var/log/*.log to fix it.

For many years this approach worked just fine. Of course, there was some drama. Occasionally something would go wrong for which there wasn’t a check, or there just wasn’t time to act on a notification, or some applications and services on top of the hosts weren’t monitored. ...

Get The Art of Monitoring now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.