Skip to Content
Designing Data-Intensive Applications, 2nd Edition
book

Designing Data-Intensive Applications, 2nd Edition

by Martin Kleppmann, Chris Riccomini
February 2026
Intermediate to advanced
650 pages
22h 2m
English
O'Reilly Media, Inc.
Content preview from Designing Data-Intensive Applications, 2nd Edition

Chapter 6. Replication

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.

Douglas Adams, Mostly Harmless (1992)

Replication means keeping a copy of the same data on multiple machines that are connected via a network. As discussed in “Distributed Versus Single-Node Systems”, there are several reasons why you might want to replicate data:

  • To keep data geographically close to your users (and thus reduce access latency)

  • To allow the system to continue working even if some of its parts have failed (and thus increase availability and durability)

  • To scale out the number of machines that can serve read queries (and thus increase read throughput)

In this chapter we will assume that your dataset is small enough that each machine can hold a copy of the entire dataset. In Chapter 7 we will relax that assumption and discuss sharding (partitioning ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Designing Data-Intensive Applications

Designing Data-Intensive Applications

Martin Kleppmann

Publisher Resources

ISBN: 9781098119058Errata Page