Chapter 6. Monitoring, Observability, and Alerting

This job would be great if it weren’t for the customers.

Randal Graves, Clerks

What Is Monitoring?

Once you have your service in the cloud, it begins to take on a whole new life. That lovely deterministic behavior you have witnessed while developing and testing your code is long gone. Your service has now been sent out to face the internet, and with that traffic comes unexpected states of entities in your systems. Bugs start to surface, and these perfectly chaotic flows of user actions and their consequences will likely cause some form of degraded service in your system at least once, if not total failure and lack of availability. When this happens, how are you supposed to know that your service is even down in the first place?

Monitoring.

Monitoring is the component of your application that enables you to detect incidents and understand why they are occurring in order to attempt to fix them. It is the way to confirm when things have returned to normal. It is how you interrogate the health and status of your systems.

But monitoring is much more than that. It is your best tool to avoid failure in the first place. With monitoring, you expose the health of your application and all of its services in real time so that you can detect anomalies before they snowball into incidents. Emitting the right metrics and being able to understand and interpret them are skills that everyone on your team must learn. In modern engineering organizations, ...

Get Learning Serverless now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.