Chapter 2. Fault Tolerant Mindset

The previous chapter defined the basic vocabulary and the four phases of fault tolerance. This chapter will look at techniques to design for fault tolerance and enhanced reliability and availability.

Fault Tolerant Mindset

What can go wrong in any given situation? That is a key question to anyone trying to develop fault tolerant software. Thinking to ask the question and defi ne the solution is called having a Fault Tolerant Mindset. In almost any situation something can go wrong. A fault tolerant program is prepared for these errors. Asking whatif questions and planning during design for the errors that might happen during execution are the hallmarks of the Fault Tolerant Mindset. What if the stack pointer becomes negative? What if the wrong subclass is instantiated? What if the message arrives out of order?

Applying a Fault Tolerant Mindset to all stages of software development is beneficial. This includes both during requirements definition and test development as well as the traditional phases of software creation (architecture, design, coding).

Design Tradeoffs

'Every problem in computer science boils down to tradeoffs' – Professor L. J. Henschen.

Mean Time To Failure (MTTF) and Mean Time to Repair (MTTR) determine the reliability and availability of a system. These two parameters can be traded off against each other. In some contexts, MTTR is the more important attribute, especially if the system is striving for high availability. Examples include ...

Get Patterns for Fault Tolerant Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.