Chapter 1. Introduction

Network automation is a continuous process of generation and deployment of configuration changes, management, and operations of network devices. It often implies faster configuration changes across a significant amount of devices, but is not limited to only large infrastructures. It is equally important when managing smaller deployments to ensure consistency with other devices and reduce the human-error factor. Automation is more than just configuration management; it is a broad area that also includes data collection from the devices, automatic troubleshooting, and self-resilience—the network can become smart enough to remediate the problems by itself, depending on internal or external factors.

When speaking about network automation, there are two important classes of data to consider: configuration and operational. Configuration data refers to the actual state of the device, either the entire configuration or the configuration of a certain feature (e.g., the configuration of the NTP peers, interfaces, BGP neighbors, MPLS etc.). On the other hand, operational data exposes information and statistics regarding the result of the configuration—for example, synchronization of the NTP peers, the state of a BGP session, the MPLS LSP labels generated, and so on. Although most vendors expose this information, their representation is different (sometimes even between platforms produced by the same vendor).

In addition to these multivendor challenges, there are others to be considered. Traditionally, a network device does not allow running custom software; most of the time, we are only able to configure and use the equipment. For this reason, in general, network devices can only be managed remotely. However, there are also vendors producing whitebox devices (e.g., Arista, Cumulus, etc.), or others that allow containers (e.g., Cisco IOS-XR, Cisco NX-OS in the latest versions).

Regardless of the diversity of the environment and number of platforms supported, each network has a common set of issues: configuration generation and deployment, equipment replacement (which becomes very problematic when migrating between different operating systems), human errors and unmonitored events (e.g., BGP neighbor torn down due to high number of receiving prefixes, NTP unsynchronized, flapping interfaces, etc.). In addition, there is the responsibility of implicitly reacting to these issues and applying the appropriate configuration changes, searching for important details, and carrying out many other related tasks.

Large networks bring these challenges to an even higher complexity level: the tools need to be able to scale enough to manage the entire device fleet, while the network teams are bigger and the engineers need to access the resources concurrently. At the same time, everything needs to be accessible for everyone, inclusively for network engineers that do not have extensive software skills. The tooling basis must be easily configurable and customizable, in such a way that it adapts depending on the environment. Large enterprise networks are heterogeneous in that they are built from various vendors, so being able to apply the same methodologies in a cross-platform way is equally important.

Network automation is currently implemented using various frameworks, including Salt, Ansible, Chef, and Puppet. In this book we will focus on Salt, due to its unique capabilities, flexibility, and scalability. Salt includes a variety of features out of the box, such as a REST API, real-time jobs, high availability, native encryption, the ability to use external data even at runtime, job scheduling, selective caching, and many others. Beyond these capabilities, Salt is perhaps the most scalable framework—there are well-known deployments in companies such as LinkedIn that manage many tens of thousands of devices using Salt.

Another particularity of network environments is dynamicity—there are many events continuously happening due to internal or external causes. For example, an NTP server might become unreachable, causing the device to become unsynchronized, a BGP neighbor to be torn down, and an interface optical transceiver unable to receive light; in turn, a BGP neighbor could leak routes, leaving the device vulnerable to an attacker’s attempt to log in and cause harm—the list of examples can go on and on. When unmonitored, these events can sometimes lead to disastrous consequences. Salt is an excellent option for event-driven network automation and orchestration: all the network events can be imported into Salt, interpreted, and eventually trigger configuration changes as the business logic imposes. Unsurprisingly, large-scale networks can generate many millions of important events per hour, which is why scalability is even more important.

The vendor-agnostic capabilities of Salt are leveraged through a third-party library called NAPALM, a community-maintained network automation platform. We will briefly present NAPALM and review its characteristics in “Introducing NAPALM”.

Automating networks using Salt and NAPALM requires no special software development knowledge. We will use YAML as the data representation language and Jinja as the template language (there are six simple rules—three YAML, three Jinja—as we will discuss in “Brief Introduction to Jinja and YAML”). In addition, there are some details are Salt-specific configuration details, covered step by step in the following chapters so that you can start from scratch and set up a complex, event-driven automation environment.

Salt and SaltStack

Salt is an open source (Apache 2 licensed), general-purpose automation tool that is used for managing systems and devices. Out of the box, it ships with a number of capabilities: Salt can run arbitrary commands, bring systems up to a desired configuration, schedule jobs, react in real time to events across an infrastructure, integrate with hundreds of third-party programs and services across dozens of operating systems, coordinate complex multisystem orchestrations, feed data from an infrastructure into a data store, extract data from a data store to distribute across an infrastructure, transfer files securely, and even more.

SaltStack is the company started by the creator of Salt to foster development and help ensure the longevity of Salt, which is heavily used by very large companies around the globe. SaltStack provides commercial support, professional services and consulting, and an enterprise-grade product that makes use of Salt to present a higher-level graphical interface and API for viewing and managing an infrastructure, particularly in team environments.

Speed is a top priority for SaltStack. As the company writes on its website:

In SaltStack, speed isn’t a byproduct, it is a design goal. SaltStack was created as an extremely fast, lightweight communication bus to provide the foundation for a remote execution engine.

Exploring the Architecture of Salt

The core of Salt is the encrypted, high-speed communication bus referenced in the quote above as well as a deeply integrated plug-in interface. The bulk of Salt is the vast ecosystem of plug-in modules that are used to perform a wide variety of actions, including remote execution and configuration management tasks, authentication, system monitoring, event processing, and data import/export.

Salt can be configured many ways, but the most common is using a high-speed networking library, ZeroMQ, to establish an encrypted, always-on connection between servers or devices across an infrastructure and a central control point called the Salt master. Massive scalability was one design goal of Salt and a single master on moderate hardware can be expected to easily scale to several thousand nodes (and up to tens of thousands of nodes with some tuning). It is also easy to set up with few steps and good default settings; first-time users often get a working installation in less than an hour.

Salt minions are servers or devices running the Salt daemon. They connect to the Salt master, which makes deployment a breeze since only the master must expose open ports and no special network access need be given to the minions. The master can be configured for high availability (HA) via Salt’s multimaster mode, or in a tiered topology for geographic or logical separation via the Syndic system. There is also an optional SSH-based transport and a REST API.

Once a minion is connected to a master and the master has accepted the public key for that minion the two can freely communicate over an encrypted channel. The master will broadcast commands to minions and minions will deliver the result of those commands back to the master. In addition, minions can request files from the master and can continually send arbitrary events such as system health information, logs, status checks, or system events, to name just a few.

Diving into the Salt Proxy Minion

As mentioned earlier, one of the challenges when managing network equipment is installing and executing custom software. Whitebox devices or those operating systems allowing containers could potentially allow installing the salt-minion package directly. But a traditional device can only be controlled remotely, via an API or SSH.

Introduced in Salt 2015.8 (codename Beryllium), proxy minions leverage the capabilities of the regular minions (with particular configuration) and make it possible to control devices such as network gear, devices with limited CPU or memory, or others. They are basically a virtual minion, a process capable of running anywhere in order to control devices remotely via SSH, HTTP, or other transport mechanism.

To avoid confusion caused by nomenclature similarities with other frameworks, a proxy minion is not another machine, it is just one process associated with the device managed, thus one process per device. It is usually lightweight, consuming about 60 MB RAM.

An intrinsic property of the proxy minions is that the connection with the remote device is always kept alive. However, they can also be designed to establish the connection only when necessary, or even let the user decide what best fits their needs (depending on how dynamic the environment is).

Because the list of device types that can be controlled through the proxy minions can be nearly infinite, each having their own properties and interface characteristics, a module (and sometimes a third-party library) is required.

Beginning with 2016.11 (Carbon), there are several proxy modules included, four of them aiming to manage network gear:

  • NAPALM (covered briefly in “Introducing NAPALM”)

  • Junos (provided by Juniper, to manage devices running Junos)

  • Cisco NXOS (for Cisco Nexus switches)

  • Cisco NSO (interface with Cisco Network Service Orchestrator)

Installing Salt: The Easy Way

SaltStack supports and maintains a shell script called Salt Bootstrap that eases the installation of the Salt master and minion on a variety of platforms. The script determines the operating system and version, then executes the necessary steps to install the Salt binaries in the best way for that system.

Therefore, the installation becomes as easy as:

wget -O bootstrap-salt.sh https://bootstrap.saltstack.com
sudo sh bootstrap-salt.sh

This fetches the bootstrap-salt.sh script from https://bootstrap.saltstack.com, then installs the Salt minion.

If you want to also install the Salt master, you only need to append the -M option:

wget -O bootstrap-salt.sh https://bootstrap.saltstack.com
sudo sh bootstrap-salt.sh -M

Introducing NAPALM

NAPALM (Network Automation and Programmability Abstraction Layer with Multivendor support) is an open source Python library that accommodates a set of methodologies for configuration management and operational data retrieval, uniformly, covering several network operating systems, including Junos, Cisco IOS-XR, Cisco IOS, Cisco NX-OS, and Arista EOS.

There are other community-driven projects for Cumulus Linux, FortiOS, PAN-OS, MikoTik RouterOS, Pluribus Netvisor, and many others can be provided or adapted in the user’s own environment.

The operational data is represented in cross-vendor format. For instance, retrieving the BGP neighbors from a device running Junos, the output is a Python dictionary with the format shown in Example 1-1.

Example 1-1. NAPALM output sample from a Junos device
{
  'global': {
    'peers': {
      '192.168.0.2': {
        'address_family': {
          'ipv4': {
            'accepted_prefixes': 142,
            'received_prefixes': 142,
            'sent_prefixes': 0
          }
        },
        'description': 'Amazon',
        'is_enabled': True,
        'is_up': True,
        'local_as': 13335,
        'remote_as': 16509,
        'remote_id': '10.10.10.1',
        'uptime': 8816095
      }
    }
  }
}

The output has exactly the same structure if retrieving the BGP neighbors, using NAPALM, from a Cisco IOS-XR router or Arista switch. The same characteristics are available for the rest of the NAPALM features whose structure is available in the NAPALM documentation.

Similarly, configuration management is also cross-vendor and although the configuration loaded depends on the network OS, the methodology is the same for all platforms. For example, when applying manual configuration changes using the CLI of Cisco IOS, the changes will be reflected directly into the running-config. But using NAPALM, the changes are stored in a buffered candidate configuration and transferred into the running config only when an explicit commit is performed. The following methods are defined for configuration management:

Method name Method description

load_merge_candidate

Populate the candidate config, either from file or text.

load_replace_candidate

Similar to load_merge_candidate, but instead of a merge, the existing configuration will be entirely replaced with the content of the file, or the configuration loaded as text.

compare_config

Return the difference between the running configuration and the candidate.

discard_config

Discards the changes loaded into the candidate configuration.

commit_config

Commit the changes loaded using load_merge_candidate or load_replace_candidate.

rollback

Revert the running configuration to the previous state.

NAPALM is distributed via PyPI (Python Package Index), which is the official repository for third-party Python libraries. The installation is usually as simple as running $ pip install napalm; however, the user might need to consider several system dependencies. For Salt users, the process is simplified through the napalm install formula, which performs the required steps to install the underlying packages.

The NAPALM Proxy

Beginning with Salt 2016.11 (Carbon) the cross-vendor capabilities of NAPALM have been integrated in Salt, allowing the network engineers to introduce the DevOps methodologies without worrying about the multivendor issues.

The initial implementation was based exclusively on the NAPALM proxy minion module. In 2017.7.0 (Nitrogen) the capabilities have been extended, allowing the NAPALM modules to run under a regular minion as well. In other words, if the device operating system permits, the salt-minion package can be installed directly on the device and then leverage the network automation methodologies through NAPALM such as controlling network devices like servers. For example, there is a SWIX extension for Arista devices to facilitate the installation of the minion directly on the switch; see the SaltStack docs.

Brief Introduction to Jinja and YAML

Before diving into Salt-specific details, let’s have a look at two of the most widely adopted template and data representation languages: Jinja and YAML. Salt uses them by default, so it’s important that you understand the basics.

The Three Rules of YAML

Yet Another Markup Language (YAML) is a human-readable data representation language. Three easy rules are enough to get started, but for more in-depth details we encourage you to explore the YAML documentation as well as the YAML troubleshooting tips.

Rule #1: Indentation

YAML uses a fixed indentation scheme to represent relationships between data layers. Salt requires that the indentation for each level consists of exactly two spaces. Do not use tabs.

Rule #2: Colons

Colons are used in YAML to represent hashes, or associative arrays—in other words, one-to-one mappings between a key and a value. For example, to assign the value xe-0/0/0 to the interface_name field:

interface_name: xe-0/0/0

The same rule can be extended to a higher level and use nested key-value pairs where we can notice the usage of the indentation:

interface:
  name: xe-0/0/0
  shutdown: false
  subinterfaces:
    xe-0/0/0.0:
      ipv4:
        address: 172.17.17.1/24

Rule #3: Dashes

Dashes are used to represent a list of items. For example:

interfaces:
  - fa1/0/0
  - fa4/0/0
  - fa5/0/0

Note the single space following the hyphen.

The Three Rules of Jinja

Jinja is a widely used templating language for Python. Like any template engine, it uses abstract models and data to generate documents. While Jinja can be quite complex, three simple rules will suffice to get started with using it.

Rule #1: Double curly braces

Double curly braces means the replacement of a variable with its value. For instance, the template in Example 1-2 will generate the result in Example 1-3 when the variable interface_name has the value xe-0/0/0.

Example 1-2. Example of double curly braces
interface {{ interface_name }}
Example 1-3. Rendering result of Jinja curly braces
interface xe-0/0/0

We will see later how you can send the variables to the template. For the moment the most important thing to note is that the output is plain text where the {{ interface_name }} has been replaced with the value of the interface_name variable.

Rule #2: Conditional tests

Conditional operators can be used to make decisions and generate different documents or parts of a document. The syntax of an if-elif-else conditional test is as follows:

{% if interface_name == 'xe-0/0/0' %}
The interface is 10-Gigabit Ethernet.
{% elif interface_name == 'ge-0/0/0' %}
The interface is Gigabit Ethernet.
{% else %}
Different type.
{% endif %}

In this example, the template will generate the output “The interface is 10-Gigabit Ethernet.” when the variable interface_name is xe-0/0/0, or “The interface is Gigabit Ethernet.” when the variable interface_name has the value ge-0/0/0, respectively.

Note that the endif keyword at the end of the block is mandatory. The {% marks the beginning of a Jinja instruction. This will also insert an additional blank line. To avoid this it can be written as {%- instead. Similarly, to avoid a new line at the end the instruction can be written as -%}.

Rule #3: Loops

Looping through a list of values has the format shown in Example 1-4, which generates the text in Example 1-5 when the variable interfaces is an array containing the values ['fa1/0/0', 'fa4/0/0', 'fa5/0/0'].

Example 1-4. Example of Jinja template loop
{% for interface_name in interfaces -%}
interface {{ interface_name }}
  no shut
{% endfor -%}
Example 1-5. Rendering result of Jinja loop
interface fa1/0/0
  no shut
interface fa4/0/0
  no shut
interface fa5/0/0
  no shut

For more advanced topics, consult the Jinja documetnation. The following chapters will cover some other Salt-specific advanced templating methodologies.

Extensible and Scalable Configuration Files: SLS

One of the most important characteristics of Salt is that data is key, not the representation of that data. SLS (SaLt State) is the file format used by Salt. By default it is a mixture of Jinja and YAML (i.e., YAML generated from a Jinja template), but flexible enough to allow other combinations of template and data representation languages.

The SLS files can be equally complex Jinja templates that translate down to YAML or they can just be plain and simple YAML files. For instance, let’s see how we would declare a list of interfaces for a Juniper device in an SLS file (Example 1-6).

Example 1-6. Sample SLS file: Plain YAML
interfaces:
  - xe-0/0/0
  - xe-0/0/1
  - xe-0/0/2
  - xe-0/0/3
  - xe-0/0/4

The same list can be generated dynamically using Jinja and YAML (Example 1-7).

Example 1-7. Sample SLS file: Jinja and YAML
interfaces:
{% for index in range(5) -%}
  - xe-0/0/{{ index }}
{% endfor -%}

Both of these representations are interpreted by Salt in the same way, but the usage of Jinja together with YAML makes the code in Example 1-7 more flexible. Although the list shown here is very short, this methodology proves really helpful when generating dynamic content, as it saves you from having to manually write a long file.

The user can choose between the following template languages: Jinja (default), Mako, Cheetah, Genshi, Wempy, or Py (which is the pure Python renderer). Similarly, there is a variety of data representation languages that can be used: YAML (default), YAMLEX, JSON, JSON5, HJSON, or Py (pure Python). Even more, the user can always extend the capabilities and define a custom renderer in their private environment—and eventually open source it.

Note

The Salt rendering pipeline processes template rendering first to produce the data representation, which is then given to Salt’s State compiler. By default the SLS first renders the Jinja content followed by translating the YAML into a Python object.

Alternative renderers can be enabled by adding a hashbang at the top of the SLS file. For example, using the hashbang #!mako|json will instruct Salt to interpret the SLS file using Mako and JSON. In that case, the SLS file would be written as shown in Example 1-8.

Example 1-8. Sample SLS file: Mako and JSON
#!mako|json
{
  "interfaces": [
    % for index in range(5):
    "xe-0/0/${index}",
    % endfor
  ]
}

Without moving the focus to Mako, the most important detail to note is the flexibility of the SLS file. Moreover, if the user has even more specific needs the good news is that the renderers are one of the many pluggable interfaces of Salt, hence a new renderer can be added very easily.

Note

The #!jinja|yaml header is implicit.

Sensitive data can be natively encrypted using GPG and Salt will decrypt it during runtime.

When inserting GPG-encrypted data it is necessary to explicitly use the hashbang with the appropriate template and data representation languages. For example, even if we work with the default Jinja/YAML combination, the header needs to be #!jinja|yaml|gpg.

The GPG renderer has more specific configuration requirements, in particular on the master, but they are beyond the scope of this book. For more information, consult the setup notes.

Remarkably, the SLS file can even be written in pure Python. For instance, the list of interfaces from before could be rewritten as shown in Example 1-9.

Example 1-9. Sample SLS file: Pure Python
#!py

def run():
    return [ 'xe-0/0/{}'.format(index)
             for index in range(5) ]

The pure Python renderer is extremely powerful; it is basically limited only by Python itself. The only constraint is to define the run function, which returns a JSON-serializable object that constitutes our data. We can go even further and design a very intelligent SLS file that builds its content based on external services or sources.

For instance, as shown in Example 1-10, we can build the list of interfaces dynamically by retrieving the data from a REST API found at the URL https://interfaces-api.

Example 1-10. The flexiblity of the pure Python renderer
#!py

import requests

def run():
    ret = requests.get('https://interfaces-api')
    return ret.json()

We will see later how the SLS files can be consumed and how we can leverage their power to help us with automating.

Get Network Automation at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.