Chapter 24. Intermittent Problems
Life is just easier when you have a reproducible test case. Nancy, for example, knew to tee up the Xerox bill for me to see. Phyllis knew just what to run to convince herself whether the system was sufficiently “faster enough” after this Oracle guy’s disk rebalancing act. But what do you do when your problem is unpredictable, when you can’t reproduce it every time?
We see it a lot. It’s kind of ironic: you’d think a problem that happens only rarely would be less of a problem than one that happens all the time. But often the intermittent problems carry the highest business priority.
You can diagnose intermittent performance problems the same way convenience stores identify robbers: they leave the cameras running all the time. Likewise, you can trace every execution of a troublesome program until you can catch one in the act of misbehaving. Trace as much as you need, but as little as you can. If you can target just a particular feature that’s misbehaving, then do that. If you can’t, then trace just one program, or one application, or one user. Even if your software gives you lots of control over what you trace, sometimes you just have to trace everything.
There are two reasons you should trace the smallest scope you can:
-
You don’t want the trace itself to damage anyone’s performance. A well-designed trace feature will incur as little measurement intrusion effect as possible. For example, in the Oracle world I’ve worked in for so long, the database’s ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access