Recovering from a disaster

Failures are a regular occurrence in large clusters. Hard drives fail, servers fail, even full data centers will go dark. Shifting services to cloud platforms such as AWS and Azure have helped, but even they have had entire regions go down. Using containers may make your applications more resistant to failure, but the hosts running those containers are still affected by any number of things. Properly engineered, your cluster should be able to cope with disaster. Here are a few things to keep in mind to keep your cluster safe.

Restarting the full cluster

There may be times when the entire swarm has to be shutdown. Hopefully, there will be time to properly shut down running services and the hosts. When the time comes to ...

Get Docker Orchestration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.