Chapter 4. Core Practice: Define Everything as Code

In Chapter 1, I identified three core practices that help you to change infrastructure rapidly and reliably: define everything as code, continuously test and deliver all work in progress, and build small, simple pieces.

This chapter delves into the first of these core practices, starting with the banal questions. Why would you want to define your Infrastructure as Code? What types of things can and should you define as code?

At first glance, “define everything as code” might seem obvious in the context of this book. But the characteristics of different types of languages are relevant to the following chapters. In particular, Chapter 5 describes using declarative languages to define either low-level (“Low-Level Infrastructure Languages”) or high-level stacks (“High-Level Infrastructure Languages”), and Chapter 16 explains when declarative or programmatic code is most appropriate for creating reusable code modules and libraries.

Why You Should Define Your Infrastructure as Code

There are simpler ways to provision infrastructure than writing a bunch of code and then feeding it into a tool. Go to the platform’s web-based user interface and poke and click an application server cluster into being. Drop to the prompt, and using your command-line prowess, wield the vendor’s CLI (command-line interface) tool to forge an unbreakable network boundary.

But seriously, the previous chapters have explained why it’s better to use code to build your systems, including reusability, consistency, and transparency (see “Core Practice: Define Everything as Code”).

Implementing and managing your systems as code enables you to leverage speed to improve quality. It’s the secret sauce that powers high performance as measured by the four key metrics (see “The Four Key Metrics”).

What You Can Define as Code

Every infrastructure tool has a different name for its source code—for example, playbooks, cookbooks, manifests, and templates. I refer to these in a general sense as infrastructure code, or sometimes as an infrastructure definition.

Infrastructure code specifies both the infrastructure elements you want and how you want them configured. You run an infrastructure tool to apply your code to an instance of your infrastructure. The tool either creates new infrastructure, or it modifies existing infrastructure to match what you’ve defined in your code.

Some of the things you should define as code include:

An infrastructure stack, which is a collection of elements provisioned from an infrastructure cloud platform. See Chapter 3 for more about infrastructure platforms, and Chapter 5 for an introduction to the infrastructure stack concept.
Elements of a server’s configuration, such as packages, files, user accounts, and services (Chapter 11).
A server role is a collection of server elements that are applied together to a single server instance (“Server Roles”).
A server image definition generates an image for building multiple server instances (“Tools for Building Server Images”).
An application package defines how to build a deployable application artifact, including containers (Chapter 10).
Configuration and scripts for delivery services, which include pipelines and deployment (“Delivery Pipeline Software and Services”).
Configuration for operations services, such as monitoring checks.
Validation rules, which include both automated tests and compliance rules (Chapter 8).

Choose Tools with Externalized Configuration

Infrastructure as Code, by definition, involves specifying your infrastructure in text-based files. You manage these files separately from the tools that you use to apply them to your system. You can read, edit, analyze, and manipulate your specifications using any tools you want.

Noncode infrastructure automation tools store infrastructure definitions as data that you can’t directly access. Instead, you can only use and edit the specifications by using the tool itself. The tool may have some combination of GUI, API, and command-line interfaces.

The issue with these closed-box tools is that they limit the practices and workflows you can use:

You can only version your infrastructure specifications if the tool has built-in versioning.
You can only use CI if the tool has a way to trigger a job automatically when you make a change.
You can only create delivery pipelines if the tool makes it easy to version and promote your infrastructure specifications.

Lessons from Software Source Code

The externalized configuration pattern mirrors the way most software source code works. Some development environments keep source code hidden away, such as Visual Basic for Applications. But for nontrivial systems, developers find that keeping their source code in external files is more powerful.

It is challenging to use Agile engineering practices such as TDD, CI, and CD with closed-box infrastructure management tools.

A tool that uses external code for its specifications doesn’t constrain you to use a specific workflow. You can use an industry-standard source control system, text editor, CI server, and automated testing framework. You can build delivery pipelines using the tool that works best for you.

Manage Your Code in a Version Control System

If you’re defining your stuff as code, then putting that code into a version control system (VCS) is simple and powerful. By doing this, you get:

Traceability: VCS provides a history of changes, who made them, and context about why.¹ This history is invaluable when debugging problems.
Rollback: When a change breaks something—and especially when multiple changes break something—it’s useful to be able to restore things to exactly how they were before.
Correlation: Keeping scripts, specifications, and configuration in version control helps when tracing and fixing gnarly problems. You can correlate across pieces with tags and version numbers.
Visibility: Everyone can see each change committed to the version control system, giving the team situational awareness. Someone may notice that a change has missed something important. If an incident happens, people are aware of recent commits that may have triggered it.
Actionability: The VCS can trigger an action automatically for each change committed. Triggers enable CI jobs and CD pipelines.

One thing that you should not put into source control is unencrypted secrets, such as passwords and keys. Even if your source code repository is private, history and revisions of code are too easily leaked. Secrets leaked from source code are one of the most common causes of security breaches. See “Handling Secrets as Parameters” for better ways to manage secrets.

Infrastructure Coding Languages

System administrators have been using scripts to automate infrastructure management tasks for decades. General-purpose scripting languages like Bash, Perl, PowerShell, Ruby, and Python are still an essential part of an infrastructure team’s toolkit.

CFEngine pioneered the use of declarative, domain-specific languages (DSL; see “Domain-Specific Infrastructure Languages”) for infrastructure management. Puppet and then Chef emerged alongside mainstream server virtualization and IaaS cloud. Ansible, Saltstack, and others followed.

Stack-oriented tools like Terraform and CloudFormation arrived a few years later, using the same declarative DSL model. Declarative languages simplified infrastructure code, by separating the definition of what infrastructure you want from how to implement it.

Recently, there is a trend of new infrastructure tools that use existing general-purpose programming languages to define infrastructure.² Pulumi and the AWS CDK (Cloud Development Kit) support languages like Typescript, Python, and Java. These tools have emerged to address some of the limitations of declarative languages.

Mixing Declarative and Imperative Code

Imperative code is a set of instructions that specifies how to make a thing happen. Declarative code specifies what you want, without specifying how to make it happen.³

Too much infrastructure code in production today suffers from mixing declarative and imperative code. I believe it’s an error to insist that one or the other of these two language paradigms should be used for all infrastructure code.

An infrastructure codebase involves many different concerns, from defining infrastructure resources, to configuring different instances of otherwise similar resources, to orchestrating the provisioning of multiple interdependent pieces of a system. Some of these concerns can be expressed most simply using a declarative language. Some concerns are more complex, and are better handled with an imperative language.

As practitioners of the still-young field of infrastructure code, we are still exploring where to draw boundaries between these concerns. Mixing concerns can lead to code that mixes language paradigms. One failure mode is extending a declarative syntax like YAML to add conditionals and loops. The second failure mode is embedding simple configuration data (“2GB RAM”) into procedural code, mixing what you want with how to implement it.

In relevant parts of this book I point out where I believe some of the different concerns may be, and where I think one or another language paradigm may be most appropriate. But our field is still evolving. Much of my advice will be wrong or incomplete. So, my intention is to encourage you, the reader, to think about these questions and help us all to discover what works best.

Infrastructure Scripting

Before standard tools appeared for provisioning cloud infrastructure declaratively, we wrote scripts in general-purpose, procedural languages. Our scripts typically used an SDK (software development kit) to interact with the cloud provider’s API.

Example 4-1 uses pseudocode, and is similar to scripts that I wrote in Ruby with the AWS SDK. It creates a server named my_application_server and then runs the (fictional) Servermaker tool to configure it.

Example 4-1. Example of procedural code that creates a server

import 'cloud-api-library'

network_segment = CloudApi.find_network_segment('private')

app_server = CloudApi.find_server('my_application_server')
if(app_server == null) {
  app_server = CloudApi.create_server(
    name: 'my_application_server',
    image: 'base_linux',
    cpu: 2,
    ram: '2GB',
    network: network_segment
  )
  while(app_server.ready == false) {
    wait 5
  }
  if(app_server.ok != true) {
    throw ServerFailedError
  }
  app_server.provision(
    provisioner: servermaker,
    role: tomcat_server
  )
}

This script mixes what to create and how to create it. It specifies attributes of the server, including the CPU and memory resources to provide it, what OS image to start from, and what role to apply to the server. It also implements logic: it checks whether the server named my_application_server already exists, to avoid creating a duplicate server, and then it waits for the server to become ready before applying configuration to it.

This example code doesn’t handle changes to the server’s attributes. What if you need to increase the RAM? You could change the script so that if the server exists, the script will check each attribute and change it if necessary. Or you could write a new script to find and change existing servers.

More realistic scenarios include multiple servers of different types. In addition to our application server, my team had web servers and database servers. We also had multiple environments, which meant multiple instances of each server.

Teams I worked with often turned simplistic scripts like the one in this example into a multipurpose script. This kind of script would take arguments specifying the type of server and the environment, and use these to create the appropriate server instance. We evolved this into a script that would read configuration files that specify various server attributes.

I was working on a script like this, wondering if it would be worth releasing it as an open source tool, when HashiCorp released the first version of Terraform.

Declarative Infrastructure Languages

Many infrastructure code tools, including Ansible, Chef, CloudFormation, Puppet, and Terraform, use declarative languages. Your code defines your desired state for your infrastructure, such as which packages and user accounts should be on a server, or how much RAM and CPU resources it should have. The tool handles the logic of how to make that desired state come about.

Example 4-2 creates the same server as Example 4-1. The code in this example (as with most code examples in this book) is a fictional language.⁴

Example 4-2. Example of declarative code

virtual_machine:
  name: my_application_server
  source_image: 'base_linux'
  cpu: 2
  ram: 2GB
  network: private_network_segment
  provision:
    provisioner: servermaker
    role: tomcat_server

This code doesn’t include any logic to check whether the server already exists or to wait for the server to come up before running the server provisioner. The tool that you run to apply the code takes care of that. The tool also checks the current attributes of infrastructure against what is declared, and works out what changes to make to bring the infrastructure in line. So to increase the RAM of the application server in this example, you edit the file and rerun the tool.

Declarative infrastructure tools like Terraform and Chef separate what you want from how to create it. As a result, your code is cleaner and more direct. People sometimes describe declarative infrastructure code as being closer to configuration than to programming.

Is Declarative Code Real Code?

Some people dismiss declarative languages as being mere configuration, rather than “real” code.

I use the word code to refer to both declarative and imperative languages. When I need to distinguish between the two, I specifically say either “declarative” or “programmable,” or some variation.

I don’t find the debate about whether a coding language must be Turing-complete to be useful. I even find regular expressions useful for some purposes, and they aren’t Turing-complete either. So, my devotion to the purity of “real” programming may be lacking.

Idempotency

Continuously applying code is an important practice for maintaining the consistency and control of your infrastructure code, as described in “Apply Code Continuously”. This practice involves repeatedly reapplying code to infrastructure to prevent drift. Code must be idempotent to be safely applied continuously.

You can rerun idempotent code any number of times without changing the output or outcome. If you run a tool that isn’t idempotent multiple times, it might make a mess of things.

Here’s an example of a shell script that is not idempotent:

echo "spock:*:1010:1010:Spock:/home/spock:/bin/bash" \
    >> /etc/passwd

If you run this script once you get the outcome you want: the user spock is added to the /etc/passwd file. If you run it ten times, you’ll end up with ten identical entries for this same user.

With an idempotent infrastructure tool, you specify how you want things to be:

user:
  name: spock
  full_name: Spock
  uid: 1010
  gid: 1010
  home: /home/spock
  shell: /bin/bash

No matter how many times you run the tool with this code, it will ensure that only one entry exists in the /etc/passwd file for the user spock. No unpleasant side effects.

Programmable, Imperative Infrastructure Languages

Declarative code is fine when you always want the same outcome. But there are situations where you want different results depending on the circumstances. For example, the following code creates a set of VLANs. The ShopSpinner team’s cloud provider has a different number of data centers in each country, and the team wants its code to create one VLAN in each data center. So the code needs to dynamically discover how many data centers there are, and create a VLAN in each one:

this_country = getArgument("country")
data_centers = CloudApi.find_data_centers(country: this_country)
full_ip_range = 10.2.0.0/16

vlan_number = 0
for $DATA_CENTER in data_centers {
  vlan = CloudApi.vlan.apply(
    name: "public_vlan_${DATA_CENTER.name}"
    data_center: $DATA_CENTER.id
    ip_range: Networking.subrange(
        full_ip_range,
        data_centers.howmany,
        data_centers.howmany++
    )
  )
}

The code also assigns an IP range for each VLAN, using a fictional but useful method called Networking.subrange(). This method takes the address space declared in full_ip_range, divides it into a number of smaller address spaces based on the value of data_centers.howmany, and returns one of those address spaces, the one indexed by the data_centers.howmany variable.

This type of logic can’t be expressed using declarative code, so most declarative infrastructure tools extend their languages to add imperative programming capability. For example, Ansible adds loops and conditionals to YAML. Terraform’s HCL configuration language is often described as declarative, but it actually combines three sublanguages. One of these is expressions, which includes conditionals and loops.

Newer tools, such as Pulumi and the AWS CDK, return to using programmatic languages for infrastructure. Much of their appeal is their support for general-purpose programming languages (as discussed in “General-Purpose Languages Versus DSLs for Infrastructure”). But they are also valuable for implementing more dynamic infrastructure code.

Rather than seeing either declarative or imperative infrastructure languages as the correct paradigm, we should look at which types of concerns each one is most suited for.

Declarative Versus Imperative Languages for Infrastructure

Declarative code is useful for defining the desired state of a system, particularly when there isn’t much variation in the outcomes you want. It’s common to define the shape of some infrastructure that you would like to repeat with a high level of consistency.

For example, you normally want all of the environments supporting a release process to be nearly identical (see “Delivery Environments”). So declarative code is good for defining reusable environments, or parts of environments (per the reusable stack pattern discussed in “Pattern: Reusable Stack”). You can even support limited variations between instances of infrastructure defined with declarative code using instance configuration parameters, as described in Chapter 7.

However, sometimes you want to write reusable, sharable code that can produce different outcomes depending on the situation. For example, the ShopSpinner team writes code that can build infrastructure for different application servers. Some of these servers are public-facing, so they need appropriate gateways, firewall rules, routes, and logging. Other servers are internally facing, so they have different connectivity and security requirements. The infrastructure might also differ for applications that use messaging, data storage, and other optional elements.

As declarative code supports more complex variations, it involves increasing amounts of logic. At some point, you should question why you are writing complex logic in YAML, JSON, XML, or some other declarative language.

So programmable, imperative languages are more appropriate for building libraries and abstraction layers, as discussed in more detail in Chapter 16. And these languages tend to have better support for writing, testing, and managing libraries.

Domain-Specific Infrastructure Languages

In addition to being declarative, many infrastructure tools use their own DSL, or domain-specific language.⁵

A DSL is a language designed to model a specific domain, in our case infrastructure. This makes it easier to write code, and makes the code easier to understand, because it closely maps the things you’re defining.

For example, Ansible, Chef, and Puppet each have a DSL for configuring servers. Their languages provide constructs for concepts like packages, files, services, and user accounts. A pseudocode example of a server configuration DSL is:

package: jdk
package: tomcat

service: tomcat
  port: 8443
  user: tomcat
  group: tomcat

file: /var/lib/tomcat/server.conf
  owner: tomcat
  group: tomcat
  mode: 0644
  contents: $TEMPLATE(/src/appserver/tomcat/server.conf.template)

This code ensures that two software packages are installed, jdk and tomcat. It defines a service that should be running, including the port it listens to and the user and group it should run as. Finally, the code defines that a server configuration file should be created from a template file.

The example code is pretty easy for someone with systems administration knowledge to understand, even if they don’t know the specific tool or language. Chapter 11 discusses how to use server configuration languages.

Many stack management tools also use DSLs, including Terraform and CloudFormation. They expose concepts from their own domain, infrastructure platforms, so that you can directly write code that refers to virtual machines, disk volumes, and network routes. See Chapter 5 for more on using these languages and tools.

Other infrastructure DSLs model application runtime platform concepts. These model systems like application clusters, service meshes, or applications. Examples include Helm charts and CloudFoundry app manifests.

Many infrastructure DSLs are built as extensions of existing markup languages such as YAML (Ansible, CloudFormation, anything related to Kubernetes) and JSON (Packer, CloudFormation). Some are internal DSLs, written as a subset (or superset) of a general-purpose programming language. Chef is an example of an internal DSL, written as Ruby code. Others are external DSLs, which are interpreted by code written in a different language. Terraform HCL is an external DSL; the code is not related to the Go language its interpreter is written in.

General-Purpose Languages Versus DSLs for Infrastructure

Most infrastructure DSLs are declarative languages rather than imperative languages. An internal DSL like Chef is an exception, although even Chef is primarily declarative.⁶

One of the biggest advantages of using a general-purpose programming language, such as JavaScript, Python, Ruby, or TypeScript, is the ecosystem of tools. These languages are very well supported by IDEs,⁷ with powerful productivity features like syntax highlighting and code refactoring. Testing support is an especially useful part of a programming language’s ecosystem.

Many infrastructure testing tools exist, some of which are listed in “Verification: Making Assertions About Infrastructure Resources” and “Testing Server Code”. But few of these integrate with languages to support unit testing. As we’ll discuss in “Challenge: Tests for Declarative Code Often Have Low Value”, this may not be an issue for declarative code. But for code that produces more variable outputs, such as libraries and abstraction layers, unit testing is essential.

Implementation Principles for Defining Infrastructure as Code

To update and evolve your infrastructure systems easily and safely, you need to keep your codebase clean: easy to understand, test, maintain, and improve. Code quality is a familiar theme in software engineering. The following implementation principles are guidelines for designing and organizing your code to support this goal.

Separate Declarative and Imperative Code

Code that mixes both declarative and imperative code is a design smell that suggests you should split the code into separate concerns.⁸

Treat Infrastructure Code Like Real Code

Many infrastructure codebases evolve from configuration files and utility scripts into an unmanageable mess. Too often, people don’t consider infrastructure code to be “real” code. They don’t give it the same level of engineering discipline as application code. To keep an infrastructure codebase maintainable, you need to treat it as a first-class concern.

Design and manage your infrastructure code so that it is easy to understand and maintain. Follow code quality practices, such as code reviews, pair programming, and automated testing. Your team should be aware of technical debt and strive to minimize it.

Chapter 15 describes how to apply various software design principles to infrastructure, such as improving cohesion and reducing coupling. Chapter 18 explains ways to organize and manage infrastructure codebases to make them easier to work with.

Code as Documentation

Writing documentation and keeping it up-to-date can be too much work. For some purposes, the infrastructure code is more useful than written documentation. It’s always an accurate and updated record of your system:

New joiners can browse the code to learn about the system.
Team members can read the code, and review commits, to see what other people have done.
Technical reviewers can use the code to assess what to improve.
Auditors can review code and version history to gain an accurate picture of the system.

Infrastructure code is rarely the only documentation required. High-level documentation is helpful for context and strategy. You may have stakeholders who need to understand aspects of your system but who don’t know your tech stack.

You may want to manage these other types of documentation as code. Many teams write architecture decision records (ADRs) in a markup language and keep them in source control.

You can automatically generate useful material like architecture diagrams and parameter references from code. You can put this in a change management pipeline to update documentation every time someone makes a change to the code.

Conclusion

This chapter detailed the core practice of defining your system as code. This included looking at why you should define things as code, and what parts of your system you can define as code. The core of the chapter explored different infrastructure language paradigms. This might seem like an abstract topic. But using the right languages in the right ways is a crucial challenge for creating effective infrastructure, a challenge that the industry hasn’t yet solved. So the question of which type of language to use in different parts of our system, and the consequences of those decisions, is a theme that will reappear throughout this book.

¹ Context about why depends on people writing useful commit messages.

² “Recently” as I write this in mid-2020.

³ I sometimes refer to imperative code or languages as programmable even though it’s not the most accurate term, because it’s more intuitive than “imperative.”

⁴ I use this pseudocode language to illustrate the concepts I’m trying to explain, without tying them to any specific tool.

⁵ Martin Fowler and Rebecca Parsons define a DSL as a “small language, focused on a particular aspect of a software system” in their book Domain-Specific Languages (Addison-Wesley Professional).

⁶ You can mix imperative Ruby code in your Chef recipes, but this gets messy. Chef interprets recipes in two phases, first compiling the Ruby code, then running the code to apply changes to the server. Procedural code is normally executed in the compile phase. This makes Chef code that mixes procedural and imperative code hard to understand. On the other hand, imperative code is useful when writing Chef providers, which are a type of library. This reinforces the idea that an imperative language works well for library code, and a declarative language works well for defining infrastructure.

⁷ Integrated Development Environment, a specialized editor for programming languages.

⁸ The term design smell derives from code smell. A “smell” is some characteristic of a system that you observe that suggests there is an underlying problem. In the example from the text, code that mixes declarative and imperative constructs is a smell. This smell suggests that your code may be trying to do multiple things and that it may be better to pull them apart into different pieces of code, perhaps in different languages.

Get Infrastructure as Code, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Infrastructure as Code, 2nd Edition by Kief Morris