Chapter 12. Building a distributed system

This chapter covers

  • Working with distribution primitives
  • Building a fault-tolerant cluster
  • Network considerations

Now that you have a to-do HTTP server in place, it’s time to make it more reliable. To have a truly reliable system, you need to run it on multiple machines. A single machine represents a single point of failure, because a machine crash leads to a system crash. In contrast, in a cluster of multiple machines, a system can continue providing service even when individual machines are taken down. Moreover, by clustering multiple machines, you have a chance of scaling horizontally. When demand for the system increases, you can add more machines to the cluster to accommodate the extra load. ...

