11.1. The Investigation Roadmap

Not all issues are the same, of course, so each incident will take on its own unique investigation. However, in almost all cases each incident will follow the roadmap shown in Figure 11-1 and in the list that follows:

Figure 11.1. Figure 11-1
  1. Investigate — To understand the issue, you must obtain as much information as possible. Each incident will differ, so the amount of information required will depend entirely on the complexity and criticality of the problem. Ultimately, you want to have enough information to be able to re-create the conditions and situation, preferably in an isolated environment so that you can fully exercise and approve the proposed resolution.

  2. Re-create — Once there is enough information, you can re-create the issue so that you can repeatedly and accurately reproduce the incident in isolation and perform the appropriate analysis. It is sometimes difficult to re-create an incident in isolation because of its nature, environmental scale, data, configuration, and many other reasons. In my personal experience issues that can't be reproduced are notoriously difficult to resolve and generally involve a large amount of trial and error. Conversely, issues that can be reproduced allow a more scientific or analytical approach to be taken to resolve them.

  3. Verify — This is a very important step in the process. You need to ensure that the ...

Get Design – Build – Run: Applied Practices and Principles for Production-Ready Software Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.