High availability and fault tolerance

High availability, in simple terms, means achieving very close to 100% system uptime by ensuring that there is no single point of failure. This is typically done by incorporating redundancy mechanisms, such as backup processes taking over instantly from the failed ones and so on.

Mastering high availability

In Mesos, this is achieved using Apache ZooKeeper, a centralized coordination service. Multiple masters are set up (one active leader and other backups), with ZooKeeper coordinating the leader election and handling lead master detection by other Mesos components such as slaves and frameworks.

A minimum of three master nodes are required to maintain a quorum for a high availability setting. The recommendation ...

Get Mastering Mesos now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.