Chapter 6. Replication
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.
Douglas Adams, Mostly Harmless (1992)
Replication means keeping a copy of the same data on multiple machines that are connected via a network. As discussed in “Distributed Versus Single-Node Systems”, there are several reasons why you might want to replicate data:
-
To keep data geographically close to your users (and thus reduce access latency)
-
To allow the system to continue working even if some of its parts have failed (and thus increase availability and durability)
-
To scale out the number of machines that can serve read queries (and thus increase read throughput)
In this chapter we will assume that your dataset is small enough that each machine can hold a copy of the entire dataset. In Chapter 7 we will relax that assumption and discuss sharding (partitioning ...