So far, we've developed a model of distribution in which the total data set is distributed among multiple machines, but any given piece of data lives on only one machine. This model carries a big advantage over a single-node configuration, which is that it's horizontally scalable. By distributing data over multiple machines, we can accommodate ever-larger data sets simply by adding more machines to our cluster.
But our current model doesn't solve the problem of fault tolerance. No hardware is perfect; any production deployment must acknowledge the fact that a machine might fail. Our current model isn't resilient to such failures: for instance, if Node 1 in our original three-node cluster were to suddenly catch ...