Chapter 4. Network

Let’s round out the fundamentals of systems by talking about the network. Networks are the communication bedrock of every system; they connect all your resources and services. Problems with the network lead to system failure. The critical nature of networks led to early specialization and networking admins to manage the networking hardware. Microservices, virtualization, and containerization have brought a tectonic shift to building and managing today’s networks. More resources to interconnect, software-defined networking, and latency-dependent applications have all upset prior expectations of network admin skills, bringing some of these administration responsibilities back into scope for the systems team.

In this chapter, I explain the landscape of networking technologies (network virtualization, software-defined networks, and content distribution networks) so you can collaborate with your network and network security teams and build the skills to strengthen the interconnection of your system’s components.

Caring About Networks

Let’s revisit the example from the previous chapter of a contemporary product website; it’s the virtual front door of a business and an example system you might manage.

A user opens up a web browser on their phone to buy a product from your company. Their wireless service provider routes their request to a CDN that operates in a data center physically close to them. If the CDN doesn’t have the data to fulfill the request, their request is routed onward. Next, a load balancer routes the request to a physical server on which the hypervisor determines which VM to route to in your cloud-hosted infrastructure. Once the VM’s Linux kernel has received the traffic, your application processes the request, and the response follows a similar path back to the client.

A lot is going on here. How many different networks did you count? Each network introduces some processing as a router determines the best path to get to the next destination. More network hops and different network types with varying transmission speeds lead to inconsistent and long response times. How many types of network devices were involved?

Your users generally don’t care about these implementation details as long as the traffic gets through reliably. However, when requests aren’t making it through, it matters a lot, and you have to figure out what is happening now. Rather than reacting to requests later, it’s helpful to understand the context of your system’s networks and build and manage them based on that knowledge. Understanding your system’s needs enables you to make informed choices, as illustrated in the example of caching data closer to clients with a CDN and routing requests to the appropriate destination with load balancers.

As with all decisions in the building blocks of your systems, the context of what you are building matters. Effective use of the resources and options available to you will improve the cost to the humans on the team to manage the system, the impact on your customers, and the business’s overall bottom line.

Key Characteristics of Networks

As with storage, there are a couple of primary ways to think about network options—wired versus wireless—and within each of these broad categories, there are different media (e.g., copper wire, fiber-optic cables) and communication protocols.

Networks have a topology, element arrangement, and data flow. Depending on the medium, network topology will define the layout of the physical cabling, the location of the different network resources, and embedded fault tolerance. All of these factors play into the cost associated with the network.

The key characteristics of networks include the following:


The capacity of the communication channel usually described as a rate for a fixed time; i.e., megabits per second (Mbps) or gigabits per second (Gbps).


The time required for the signal to travel from one point to its destination, which depends on the physical distance the signal has to travel.

Network latency is more accurately defined by the end-to-end time to transmit the message (transmission time), the time to process the request by all the network devices along the way (processing delay), and the length of time taken up by the queue of requests to be processed (queuing delay).


The variance from the median latency. For a specific request, you can see the network latency. To calculate the expected latency, an average of some number of data points will be used. The jitter is how to describe the variance of that measurement. For workloads that depend on low-latency networks (e.g., audio, streaming), jitter can be helpful to assess the quality of the network in terms of consistency.


The measure of the probability of the network being available. Different networks are capable of handling different numbers of failures.

Build a Network

Imagine that you’re responsible for deploying a system to a data center. The system has a gateway that routes to an application that consists of a database and a bank of web servers. The data center provides backbone connectivity, but you’re responsible for everything else. So, what network resources will you need? Here are some that come to mind:

  • A firewall to filter ingress and egress traffic

  • A gateway router to accept incoming traffic from the public internet, steer it to internal resources that process that traffic, and relay outbound traffic from the internal hosts onward back to remote clients

  • A load balancer to distribute traffic among the web servers

  • Intrusion detection systems to protect the network from unauthorized external access and other suspicious network activity

  • A VPN gateway that grants authorized remote users elevated access to the private network

When factoring in your network’s needs, think about the traffic patterns, type, and amount of traffic.

Often networks are described based on the available bandwidth. However, even if two compute environments both have high bandwidth in their connection to the broader internet, their physical separation may limit the quality of interaction because of the latency or jitter.

The Open Systems Interconnection (OSI) reference model is a seven-layered architecture that is used to visualize details about protocol and interface implementation. For example, traditional load balancing is called Layer 4 (L4) load balancing because it occurs at the fourth level, transport. This type of load balancing occurs by the network device or application distributing requests based on the source and destination IP addresses and ports without deeper introspection into the content of the packets. Layer 7 (L7) load balancing occurs at the seventh level, application. Network or applications that are using application load balancing distribute requests based on the requests characteristics.

But the labels aren’t perfectly accurate; they capture enough context to differentiate their use. For example, L4 load balancing could be more accurately described as L3/L4 load balancing because the load balancer uses network and transport characteristics in distributing requests. And L7 load balancing could be more accurately described as L5–L7 load balancing because the load balancer uses session, presentation, and application protocol characteristics to identify the best destination for requests.

Early L7 load balancing was very expensive due to the compute necessary to process requests. Now, with the advance in technology, the cost between L4 and L7 implementations is negligible compared to the benefits of more flexibility and efficiency of L7 load balancing.1

Recall the five layers of the Internet model from Table 1-2.

Each layer communicates via the interfaces above and below via a message object specific to the layer. Layering separates the roles and responsibilities, enabling humans to build (and change) different parts of communication protocols, which has fueled much of modern network transformation.


Creating a network comes down to two things: an ability to send and receive data and a mechanism to make decisions about how to do so.

In the past, you would buy a dedicated single-purpose device for each network function. Now you can deploy virtualized versions of these components using techniques similar to your other infrastructure resources. Just as service providers have virtualized traditional server roles (i.e., databases and web servers), providers virtualize network services with anonymous network equipment, so you can run software that manages how the hardware transmits and receives data.

However, you can’t virtualize all aspects of networking. For example, communication with remote hosts necessarily involves physical data channels, such as Ethernet cables, transoceanic fiber-optic lines, satellite uplinks, or WiFi adapters. These channels are different enough to require specific hardware to handle the data link operations. But that’s the advantage of the separation of protocol implementation and interfaces with the internet. As long as the physical layer resources are in place and working, you have the flexibility to set up the transport and network resources as you see fit.

The ability to deploy arbitrary network functionality on generic hardware empowers us with tremendous flexibility. You don’t have to acquire specialized equipment and then go to a data center to “rack and stack” it when an API call can fulfill the same need. Instead, network resources can scale vertically and horizontally with the rest of your infrastructure.

Software-Defined Networks

With the proliferation of deployed network resources at scale, your challenge is managing and protecting these resources in a cohesive, holistic way. Early approaches to internetworking used a decentralized philosophy where routers had only a vague sense of how to relay traffic to its final destination. A decentralized philosophy made the internet resilient enough to recover from natural disasters but didn’t guarantee network stability. Moreover, this approach didn’t account for the evolving nature of security. While early engineers designed the internet to survive network segmentation, they didn’t consider malware threats like the Morris worm or the ubiquitous integration of computers into daily life, making everyone much more vulnerable to malicious activity.

Consider the challenges faced by a university network administrator. The institution provides certain computing resources (i.e., servers, workstations, and printers) and allows students and faculty to use their own devices (i.e., laptops, tablets, and phones). While the IT department patches and physically secures the university’s equipment, it’s much harder to enforce specific security policies on other people’s equipment. As a result, it’s only a matter of time before there’s a problem with malware, ransomware, or viruses originating from unsecured personal devices.

Software-defined networking (SDN) provides tools to help you manage and protect your resources. SDN is an approach to network management that conceptualizes entire networks as a single programmable computer. Just as conventional computers use an OS to orchestrate hardware resources on behalf of high-level applications, SDNs introduce a centralized framework for coordinating the operations of a distributed network, activating resources as needed, automatically adapting to volatile conditions, and allowing you to push out uniform policies.

So a network admin could run a threat intelligence management application combined with shared threat sources to compile a denylist for malicious websites. Then when device owners attempt to visit a malicious website, they will be directed to a warning page so that they can take the appropriate actions.

The defining attribute of SDNs is the use of a high-level control plane to govern the operation of the activity on individual network devices. While providers optimize software on the data or forwarding plane for speed, simplicity, and consistency, the control plane provides a flexible interface for defining policies and handling exceptions.

SDN architecture uses a centralized, programmable controller that oversees network operations. This controller uses southbound APIs to push information down to devices such as routers and firewalls and northbound APIs that relay state information to the controller. Most SDN implementations use the OpenFlow protocol to manage network devices in a vendor-agnostic way. As long as the physical or virtual equipment supports a programmatic interface for defining how to route or drop traffic, you can govern it with an SDN controller.

Multiple SDN controller applications can participate simultaneously. For example, some control plane applications focus on deployment and provisioning operations, others may meter traffic for billing purposes, and others can handle various aspects of network security.

Segmentation is another way to protect your networks. Segmenting your network can optimize traffic flow for legitimate uses of the network and classify the damage done in the event of a malware attack or data breach. With machine learning, modern software-defined networks can automatically learn to identify usage patterns and use this information to guide the operation of microsegments. Still, as with all machine learning systems, the outcomes are only as good as their training data.

Content Distribution Networks

A key element of smooth system operation is responsive network services. Users have come to expect near instantaneous response times and assume that things are broken if there are any delays. And yet, no amount of computing power can overcome the speed of light. The further away your users are, the more noticeable this is.

Consider a site operating from San Francisco as depicted in Table 4-1 with the following assumptions:

  • All sites are connected to San Francisco with fiber in a straight line at the stated distance.2

  • The speed of light is approximately 5 ms per 1,000 km for fiber.

Table 4-1. Distance and average latency from San Francisco to other sites
New York City London Tokyo Sydney Johannesburg

Distance from San Francisco

4,130 km

11,027 km

17,944 km

11,934 km

16,958 km


21 ms

55 ms

90 ms

60 ms

8 5ms

Round-trip time

42 ms

110 ms

180 ms

120 ms

170 ms

Now, multiply the round-trip time (RTT) by the request size. The difference between users accessing the site from New York City and Tokyo is markedly different. In the real world, we have to factor in the fact that most places are not connected by fiber in straight lines, media has different latencies, and for every network hop, the network devices add delay for processing the route. Also, there are no guarantees about other traffic on the same network segments.

To overcome the limitations of network latency between sites, you need a copy of your site somewhere close enough to your customers so that these delays are negligible. While you could do this by building out a global network of your own, it’s far simpler to outsource the work to CDNs, which take on the burden of operating a global array of data centers called points of presence (PoPs). By distributing your site to a local PoP, you can lower the response time for users close to those points to less than 1 ms.

Choose your CDN based on the set of features (e.g., availability, regions served, and routing options) that optimize your expenditure. With a CDN, you can do the following:

  • Improve load times by distributing content closer to your consumers.

  • Reduce the cost of bandwidth. Instead of making multiple redundant cross-country trips, most requests stay on the edge and pull from cached content.

  • Increase availability and redundancy by having numerous global copies of your content.

  • Improve security by mitigating the impact of a distributed denial-of-service (DDoS) attack. In a DDoS attack, malicious actors attempt to flood a site with traffic to exhaust a system’s resources. Some CDN providers can prevent the malicious activity from reaching your servers, meaning that your system won’t experience perceived downtime.

Using a CDN helps solve some of your problems, but it does add a layer of complexity in managing services, the specific configurations provided by your CDN, and your site’s caches.

If you are currently using a CDN, check your service provider’s documentation to figure out when you should clear cached resources. Consider situations such as these:

  • Problems are occurring for a subset of your users. For example, someone pushed a change that had unintended consequences based on existing cached data.

  • Problems are occurring for all of your users. For example, you had a bad site build.

In general, avoid purging your entire cache because doing so would cause a cascade of requests to repopulate the cache.


If you are using caches with a web server, take some time to learn about web cache poisoning, an online attack on your cached data where an attacker leverages a vulnerability in your (unpatched) web server that causes a change in your cache that is then served to other users. James Kettle provides a great resource on how caches work and how web cache poisoning occurs.

Guidelines to Your Network Strategy

With your understanding of the landscape of networking technologies (network virtualization, software-defined networks, and content distribution networks), you can start to build your network strategy. Consider the following:

  • Understand your latency needs. Consider bringing necessary systems closer to the end users to improve latency, whether through caching, mirrored systems, or segmented data. This means having a good understanding of how and where your users connect to you; i.e., via phones (unreliable wireless availability), laptops (mostly reliable wireless connections), hardwired connections, and physical distance like global markets.

  • Leverage new protocols in your systems:

    • Use HTTP/2 to provide a faster and higher-quality user experience.

    • Use QUIC networking to maintain a seamless connection even when mobile users switch between network connections.

  • Keep informed of internet security threats, and monitor advisories related to the software you use.

Wrapping Up

Whether wired, wireless, or virtualized, networks are how the resources and services that you manage exchange data with one another. Just as with the rise of devops, the boundary between system administration and software engineering has become blurred, and so too is the distinction blurring between sysadmins and network admins.

Modern software-defined networks take a centralized approach to route network traffic efficiently while providing network operators with tools to regulate traffic, protect against malware, defend against unauthorized activity, and handle billing for metered users. Similarly, content distribution networks provide a better experience for a global population of users by caching website data at facilities that are physically close to your users.

When you begin to set up and manage your network infrastructure, you need to consider how different resources on your network communicate with one another, how much data they’re exchanging, and how tolerant they are of latency delays. Using modern approaches can provide you and your users with a fast, secure, and resilient network.

1 Learn more about layer 7 load balancing from the NGINX documentation.

2 In reality, networks don’t connect this way. A complex set of partnerships and geographic locations have different levels of network infrastructure. Learn more about internet exchange points and how internet service providers and CDNs connect from the Cloudflare Learning Center post.

Get Modern System Administration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.