Focus on recovery
We have accepted the reality that to err is human and that our bounded isolated components will inevitably experience failures. We will instead focus our energies on the mean time to recovery. We have instrumented our components to be highly observable and we have strategically created synthetic transactions that continuously generate traffic through the system so that we can observe the behavior of key performance indicators. We have created alerts that monitor the key performance indicators, so that we can jump into action as soon as a problem is detected. From here we need a method for investigating the problem and diagnosing the root cause that allows us to focus our attention and recover as quickly as possible.
Teams ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access