8 Fault tolerance basics

This chapter covers

  • Run-time errors
  • Errors in concurrent systems
  • Supervisors

Fault tolerance is a first-class concept in BEAM. The ability to develop reliable systems that can operate even when faced with run-time errors is what brought us Erlang in the first place.

The aim of fault tolerance is to acknowledge the existence of failures, minimize their impact, and, ultimately, recover without human intervention. In a sufficiently complex system, many things can go wrong. Occasional bugs will happen, components you’re depending on may fail, and you may experience hardware failures. A system may also become overloaded and fail to cope with an increased incoming request rate. Finally, if a system is distributed, you can ...

Get Elixir in Action, Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.