Disaster recovery
If the swarm directory content is lost or corrupted on a manager, it's required to immediately remove that manager out of the cluster using the docker node remove nodeID command (and use --force in case it gets stuck temporarily).
The cluster administrator should not start a manager or join it to the cluster with an out-of-date swarm directory. Joining the cluster with the out-of-date swarm directory brings the cluster to an inconsistent state, as all managers will try to synchronize wrong data during the process.
After bringing down the manager with the corrupted directory, it's necessary to delete the /var/lib/docker/swarm/raft/wal and /var/lib/docker/swarm/raft/snap directories. Only after this step can the manager safely re-join ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access