Chapter 1. What Is Network Automation?

Network Design

Network automation starts with the design of a network. A good network is, first of all, one whose composition, topology, and configuration (its design) are always completely known. To achieve this goal, we may think that we need to automate the design of a new network itself. But it can also start with automating the representation of an already existing one through automated network documentation. With new networks, design automation gives the network designers the responsibility to describe the desired results, such as how many network segments they want, how they are connected, what their connection with the internet is, and how they are protected (for example, allowing only email or web access), as well as the minimum bandwidth to reserve for video conferencing services and so on. With such information, the automation system could then take care of all the low-level details—for example, map out how many switches or routers should be deployed and describe how to connect and configure them. For existing networks, design automation would entail certifying or auditing their actual topology and producing a similar map by probing all the devices active on the network to extract their configuration parameters and infer from them topology and other information.

When a network’s entire actual structure and composition are completely known, every part of the network becomes easy to upgrade, replace, or restore after faults, at the smallest overall cost. As obvious as this thesis may seem, in the real world, “completely known” is exactly where the problems start. System administration folklore abounds with stories of ghost servers or switches, long forgotten in some basement but still running and connected to the internet—that is, equipment that hosts content that should not be there, runs software that begs to be attacked, or, in the best possible case, wastes bandwidth and electricity for no reason at all. Even when no ghost devices are present, company reorganizations, acquisitions, or relocations to new facilities can cause unpleasant surprises, creating designs like those depicted in Figure 1-2. In all such circumstances, if the network’s inventories and maps do not match reality, administrators will consume precious time just to be sure of what they should do first.

In addition to not knowing of a device’s existence and its place within the network, it can be equally easy to forget how some devices are actually configured and why configuration choices were made. When network requirements change and static legacy network device configurations remain in place, bandwidth bottlenecks and other performance degradation (read: unnecessary costs and unnecessary poor user experience) may remain hidden for years. Moving content and services to the cloud, either public or private, may increase the likelihood of such events.

Figure 1-2. Patchwork versus planned design. In the patchwork design at the top, as would be created through M&A or unplanned organic growth, it’s a patchwork where users go all over the place to access the services they need. However, in the planned design at the bottom, the services are centralized and access paths are more clearly defined. It’s not perfect, but it is much more well thought out.

This lack of visibility and inefficient network design should never happen in a good network. The basic methods and approaches to proper network design and visibility are well known, in principle, and are all based on open technologies. There are many ways to collect make, model, serial number, virtual local area networks (VLANs) and IP addresses, address resolution protocol (ARP) tables, and all other details for every device on the network. Standard protocols and procedures, from wire tracing and port mapping to Cisco Discovery Protocol (CDP) and Link Layer Discovery Protocol (LLDP), can gather all the data that a technician needs to infer and draw a full diagram of the whole network and understand the entire network design. The point is, neither the collection of that raw data nor its formatting and presentation should ever be done, or kept current, manually. Doing so would surely cost more in staff time, without any guarantee that errors would not be made, than using solutions already tested in many other organizations. The same is true, in almost all cases, for carefully crafted in-house custom scripts and tools that invariably end up consuming much more time in maintenance than originally expected.

Asset inventories and everything else that is necessary to have complete network visibility and control in real time should be a given, not something that requires constant intervention or dedicated manual efforts. Maps that can show the exact design of a network, with detailed visibility into the location or switchport connections of every device, should constantly update by themselves, as soon as the network changes. Even higher-level operations—for example, partitioning a network in semi-independent zones that can be independently managed or updated one at a time, without affecting all the others—should happen with as little manual work as possible, following consistent but automatic procedures.

Network Configuration: Policies

A perfectly mapped network is still an ugly place without rules on how to use it and adapt it to its users’ needs that are easy to set and follow without ambiguities by all interested parties. The most common but by no means the only examples of such rules and procedures are those used to define bandwidth caps, access-control lists (ACLs), user quotas, password policies, and firewall rules. As we think about automating these rules and procedures, we can think about automation both in terms of policy definition and policy enforcement. Automating policy enforcement is exactly what network devices like firewalls are designed for, so when we speak of automating network policy, it is very much about the policy definition or configuration. When putting together the description and enforcement of exactly how users must or can use assets, it’s important that those tasks be automated to consume as little IT staff time as possible. Besides reducing the daily load on network administrators, this automation of policy definition brings two other big advantages: consistency and (self) documentation. To expand on these, if policy definition has been automated, all policies will follow the same format and structure and will look consistent to the readers of your network design and documentation. Along with this, the documentation of policy configuration and changes can be automated at the time of policy creation, so your network documentation is always up to date.

Network Configuration: Provisioning

Clear, concretely enforceable rules on how to use the network are of little use if the network itself, or the services it makes accessible, are hard to change. With infrastructure as a service (IaaS), distributed teams, and work-from-anywhere becoming increasingly common, it becomes necessary to provide applications, services, and general connectivity to any combination of local and remote hardware and virtual platforms. To understand when, how, and why this provisioning could happen in practice, it is easiest to consider a software development application: let’s consider the developers of some real-time collaboration software that need to reproduce a reported bug. In such a situation, those developers would work much better if they could reproduce the issue exactly as the client sees it, in the exact same network where the bug was first noticed, and at minimum cost. They would need a virtual network to play in, maybe for just a few hours or days, but with virtual switches, virtual firewalls, and so on that both reproduce the desired conditions and keep that area completely isolated from the rest of the network. Other examples may be a company that needs to set up a product demo at a conference or a university that must run final exams in a temporary but tightly controlled network to avoid cheating. Both would have very similar needs and would benefit from streamlined, automated provisioning.

These are just a few examples of why, to keep up with the pace of business, adding users, LANs, VPNs, virtual switches or firewalls, and more must be possible in real time, in ways that are transparent to end users and, to some extent, also to the network staff. In a fully automated workflow, for every situation like those just described, the users should ideally be able to describe what they need to do and under which high-level conditions without having to configure intricate technical details manually. For example, “emulate a running website with up to a hundred simultaneous users, each with at least upstream X bandwidth, but isolated from the real internet” is a description of a high-level network to provision, without the need to include all the details. In other words, as far as provisioning is concerned, network automation must make it possible to perform and coordinate all these tasks always in the same way, from the same interface, regardless of where the interested software and physical devices are, and by describing the desired outcome—that is, the final status the network should be in—rather than which options should be set to get there.

Life Cycle

Networks are most valuable when they are reliable, and a reliable network depends on managing the full life cycle—from initial deployment to end of life—of all of the underlying infrastructure that keeps the network up and running. To start, the predictable, regular updates of firmware and software are the simplest of several life cycle issues to consider. Company acquisitions or opening new remote offices are much more complex, but they are likely to happen—in most cases, at least—with enough notice to allow proper planning of how the network should be expanded or redesigned.

A number of less predictable updates occur throughout a device’s life cycle as well. Take identified vulnerabilities and the subsequent security patches as an example. This example is well positioned for automation, as security advisories are released without notice and often need near immediate reaction—little time to plan manual activities. Let’s look at security patching as another example of a progressive automation. As a first step, a properly automated network should spot and report automatically every security advisory or software update that affects any of its devices as soon as it is announced. This is an incremental process, as shown in Figure 1-3, as the maturity of the life cycle automation increases. Similar notifications and reports should be issued for ordinary new releases of firmware or software, indicating which specific devices should be updated but at first leaving to the administrators the responsibility to push those updates manually. As you increase the degree to which the security patching is automated, these manual updates become automated: first by enabling the IT administrator to simply initiate the process and confirm the result, and eventually without any human intervention or oversight at all. It must be stressed that all of this monitoring should happen regularly, by itself: effective, real-world automation is not a series of fire-and-forget actions but a self-sustaining, incremental process. Even nontechnical notifications, like approaching expiration of support contracts or the mandatory phaseout of some product, should be issued and reported by the network automation system, in one place and one coherent format, to give full visibility of what lies ahead. Ideally, network managers should always have available, in any moment, the exact, complete answer to questions like: if one of my devices fails, am I able to replace it with a similar device, or are those devices no longer available for purchase? As far as it is concerned, the network automation system should contribute to the answer by being able to list, in addition to all the parameters mentioned previously, the exact capabilities of each device.

Figure 1-3. An example of incremental progression toward automated patching of network device security vulnerabilities.

As an extension of life-cycle automation, there are continuous changes to compliance requirements with new regulations for privacy, data protection, employee safety, and financial transparency. General Data Protection Regulation (GDPR), the Sarbanes–Oxley (SOX) Act, and the Health Insurance Portability and Accountability Act (HIPAA) are only three of the many regulations that put concrete obligations on company networks in the United States, the European Union, and beyond.

While we often think of these frameworks as putting obligations on data, it is worth noting that these acts have an impact on networks as well. They routinely mandate what a network must guarantee (i.e., uptime) or prevent (i.e., reduce risk) and also how to present the corresponding data about the network—for example, through reports in the Information Technology Infrastructure Library (ITIL) standard format.

All of these reports should not be prepared only when an audit is coming. They should always be there, already ready for external or internal audits, courtesy of the network automation services. The same services should also continuously work to maintain compliance, refusing—or at least warning against—any change to the configuration of the network that would end compliance with some regulation.

Get Network Automation Roadmap now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.