Alerting on error-related issues

We should always be aware of whether our applications or the system is producing errors. However, we cannot start panicking on the first occurrence of an error since that would generate too many notifications that we'd likely end up ignoring.

Errors happen often, and many are caused by issues that are fixed automatically or are due to circumstances that are out of our control. If we are to perform an action on every error, we'd need an army of people working 24/7 only on fixing issues that often do not need to be fixed. As an example, entering into a "panic" mode because there is a single response with code in 500 range would almost certainly produce a permanent crisis. Instead, we should monitor the rate ...

Get The DevOps 2.5 Toolkit now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.