Knowing whether our applications are having trouble to respond fast to requests, whether they are being bombed with more requests than they could handle, whether they produce too many errors, and whether they are saturated, is of no use if they are not even running. Even if our alerts detect that something is wrong by notifying us that there are too many errors or that response times are slow due to an insufficient number of replicas, we should still be informed if, for example, one, or even all replicas failed to run. In the best case scenario, such a notification would provide additional info about the cause of an issue. In the much worse situation, we might find out that one of the replicas of the ...
Alerting on unschedulable or failed pods
Get The DevOps 2.5 Toolkit now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.