Resilience is about avoiding failure. Where stability mostly concerns itself with expected inputs, resilience concerns itself with what happens when your code is exposed to unexpected or nonroutine inputs. Resilience in software systems is also known as fault tolerance and is sometimes spoken about in terms of redundancies or contingencies. Fundamentally, these are all in service of the same goalâto minimize the effects of failure.
For critical systems, where lives depend on ongoing functionality, various contingencies are often built into the system. If a failure or fault arises, the system can isolate and tolerate that failure by utilizing its contingencies.
NASA, when building flight control systems for the Space Shuttle, ...