Chapter 17. Troubleshooting

Troubleshooting is both an art and a science. The more artistic ability you develop, the faster you can fix problems. However, there is no escaping the science part of the equation. Troubleshooting is an applied scientific method with a divide-and-conquer approach.

With that in mind, you need to pool all your technical knowledge about how your subsystems work and interact with each other. When you encounter a problem, you need to put on your scientist's lab coat and hypothesize where that problem is coming from. If you combine the scientific method of testing your hypothesis (make only one change at a time, please) with the concept of divide and conquer, you can isolate the problem to a subsystem. Repeat the process and isolate the problem to part of a subsystem. Keep repeating the process until you know exactly what the problem is. That is a systematic approach to troubleshooting.

In a best-case scenario, taking a random approach will take you longer to isolate and fix the problem. And in a worst-case scenario, your problem will get worse.

In this chapter, we examine a systematic approach to troubleshooting. Additionally, we discuss taking a proactive stance to prevent problems before they happen.

Asking Murphy What Can Go Wrong

Not to sound like a pessimist, but Murphy's Law applies to computers. Design your system with that in mind, and your problems will be few and far between. VMware is extremely dependent on its connected subsystems because a change ...

Get VMware® Infrastructure 3 FOR DUMMIES® now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.