Docker Swarm (or any other scheduler) is taking care of self-healing. As long as there's enough hardware capacity, it will make sure that the desired number of replicas of each service is (almost) always up-and-running. If a replica goes down, it'll be rescheduled. If a whole node is destroyed or loses connection to other managers, all replicas that were running on it will be rescheduled. Self-healing comes out of the box. Still, there are quite a few other tasks we should define if we'd want our solution to be self-sufficient and (almost) fully autonomous.