Chapter 8. Fault-tolerance basics
This chapter covers
- Runtime errors
- Errors in concurrent systems
Fault tolerance is a first-class concept in BEAM. The ability to develop reliable systems that can operate even when faced with runtime errors is what brought us Erlang in the first place.
The aim of fault tolerance is to acknowledge the existence of failures, minimize their impact, and ultimately recover without human intervention. In a sufficiently complex system, many things can go wrong. Occasional bugs will happen, components you’re depending on may fail, and you may experience hardware failures. A system may also become overloaded and fail to cope with an increased incoming request rate. Finally, if a system is distributed, ...