Michael Nygard wrote Release It! Design and Deploy Production-Ready Software (Pragmatic Bookshelf), which won a Jolt Productivity award in 2008. His other writings can be found at http://www.michaelnygard.com/blog.
HARDWARE IS FALLIBLE, SO WE ADD REDUNDANCY. This allows us to survive individual hardware failures, but increases the likelihood of having at least one failure present at any given time.
Software is fallible. Our applications are made of software, so they’re vulnerable to failures. We add monitoring to tell us when the applications fail, but that monitoring is made of more software, so it too is fallible.
Humans make mistakes; we are fallible also. So, we automate actions, diagnostics, and processes. Automation removes the chance for an error of commission, but increases the chance of an error of omission. No automated system can respond to the same range of situations that a human can.
Therefore, we add monitoring to the automation. More software, more opportunities for failures.
Networks are built out of hardware, software, and very long wires. Therefore, networks are fallible. Even when they work, they are unpredictable because the state space of a large network is, for all practical purposes, infinite. Individual components may act deterministically, but still produce essentially chaotic behavior.
Every safety mechanism ...