Chapter 2. Elements of Reliability

Reliability is what separates a well-designed network from a bad one. Anybody can slap together a bunch of connections that will be reasonably reliable some of the time. Frequently, networks evolve gradually, growing into lumbering beasts that require continuous nursing to keep them operating. So, if you want to design a good network, it is critical to understand the features that can make it more or less reliable.

As discussed in Chapter 1, the network is built for business reasons. So reliability only makes sense in the context of meeting those business requirements. As I said earlier, by "business" I don't just mean money. Many networks are built for educational or research reasons. Some networks are operated as a public service. But in all cases, the network should be built for clearly defined reasons that justify the money being spent. So that is what reliability must be measured against.

Defining Reliability

There are two main components to my definition of reliability. The first is fault tolerance. This means that devices can break down without affecting service. In practice, you might never see any failures in your key network devices. But if there is no inherent fault tolerance to protect against such failures, then the network is taking a great risk at the business' expense.

The second key component to reliability is more a matter of performance and capacity than of fault tolerance. The network must meet its peak load requirements sufficiently ...

Get Designing Large Scale Lans now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.