Failures in Complex Systems

This section discusses failure modes and effects in complex systems. From this discussion you can gain an appreciation of how complex systems can fail in complex ways. Before you can design a system to recover from failures, you must understand how systems fail.

Failures are the primary focus of the systems architect designing highly available (HA) systems. Understanding the probability, causes, effects, detection, and recovery of failures is critical to building successful HA systems. The professional HA expert has many years of study and experience with a large variety of esoteric systems and tools that are used to design HA systems. The average systems architect is not likely to have such tools or experience but ...

Get Designing Enterprise Solutions with Sun™ Cluster 3.0 now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.