8 Fault-tolerance basics

This chapter covers:

  • Runtime errors
  • Errors in concurrent systems
  • Supervisors

Fault-tolerance is a first-class concept in BEAM. The ability to develop reliable systems that can operate even when faced with runtime errors is what brought us Erlang in the first place.

The aim of fault-tolerance is to acknowledge the existence of failures, minimize their impact, and ultimately recover without human intervention. In a sufficiently complex system, many things can go wrong. Occasional bugs will happen, components you’re depending on may fail, and you may experience hardware failures. A system may also become overloaded and fail to cope with an increased incoming request rate. Finally, if a system is distributed, you can experience ...

Get Elixir in Action, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.