Chapter 1. The Puppet Design Philosophy

Before we begin to explore best practices with Puppet, it’s valuable to understand the reasoning behind these recommendations.

Puppet can be somewhat alien to technologists who have a background in shell scripting. Whereas most scripts are procedural, Puppet aims to be declarative. This allows it to take a node from an unknown state and converge it to a known, desired state.

Puppet’s declarative design principles drive the practices in the coming chapters. Although the declarative language has many advantages for configuration management, it does impose restrictions on the approaches used to solve problems. Understanding the philosophy behind the design will help contextualize the recommendations covered in this book.

Declarative Code

As we discussed a moment ago, the Puppet domain-specific language (DSL) is a declarative language, as opposed to the imperative or procedural languages with which system administrators tend to be most comfortable and familiar.

Note
Imperative language declares the actions to accomplish a task. Declarative language declares the desired result. You’ll see examples of these differences as we proceed.

In theory, a declarative language is ideal for baseline configuration tasks. Because declarative language defines the result, and not the path to get there, the Puppet language is (mostly) verbless. Using the Puppet DSL, we describe only the desired state of our nodes. Puppet handles all responsibility for making sure the system conforms to this desired state. Understanding and internalizing this paradigm is critical when working with Puppet.

Unfortunately, most of us are used to a procedural approach to system administration. The vast majority of the bad Puppet code I’ve seen has been the result of trying to write procedural code in Puppet rather than adapting existing procedures to a declarative approach. Attempting to force Puppet to use a procedural or imperative process quickly becomes an exercise in frustration and produces fragile code.

If your infrastructure is based around modern open source software, Puppet’s built-in types and providers offer declarative methods to handle most operational tasks. This means that writing declarative Puppet code will be relatively straightforward. In other circumstances, we might be tasked to deploy software that demands a procedural installation process. A large part of this book attempts to address how to handle many of the uncommon or irregular requirements in a declarative way.

Implementing a procedural process in Puppet might be unavoidable due to interactions with external components. If your infrastructure includes Windows nodes and enterprise software, writing declarative Puppet code can be significantly more challenging. Simply putting procedural code in Puppet is rarely elegant, often creates unexpected bugs, and is always difficult to maintain. We explore practical examples and best practices for handling procedural requirements when we look at the exec resource type in Chapter 5.

A major challenge that system administrators face when working with Puppet is our own mindset. Our daily job responsibilities have often been solved with an imperative workflow. How often have you manipulated files using regular expression substitution? How often do we massage data using a series of temp files and piped commands? Even though Puppet offers many ways to accomplish the same tasks, most of our procedural approaches do not map well into Puppet’s declarative language. We must learn a different mindset to translate imperative design into a declarative model. We explore examples of these situations and discuss declarative approaches to solving it.

What Is Declarative Code, Anyway?

As mentioned earlier, declarative code should not have verbs.

We don’t create or remove users.
We declare that users should be present or absent.
We don’t install or remove software.
We declare that software should be present or absent.

Whereas create and install are verbs, present and absent are adjectives. The difference seems trivial at first but proves to be very important in practice. Let’s examine a real-world example. Imagine that you’re being given directions to the Palace of Fine Arts in San Francisco:

  1. Head North on 19th Avenue.

  2. Get on US-101S.

  3. Take Marina Boulevard to Palace Drive.

  4. Park at the Palace of Fine Arts Theater.

These instructions make a few major assumptions:

  • You aren’t already at the Palace of Fine Arts.

  • You are driving a car.

  • You are currently in San Francisco.

  • You are currently on 19th Avenue or know how to get there.

  • You are heading north on 19th Avenue.

  • There are no road closures or other traffic disruptions that would force you to a different route.

Compare this to the declarative instructions:

  1. Be at 3301 Lyon Street, San Francisco, CA 94123 at 7:00 PM

The declarative approach has major advantages in this case, namely:

  • It requires no variants based on your current location or mode of transportation.

  • This instruction is valid whether your plans involve public transit or a parachute.

  • This instruction is valid regardless of your starting point.

  • This instruction empowers the driver to route around road closures and traffic.

In short, the declarative approach is simpler and easier to write. It allows you to choose the best way to reach the destination based on your current situation.

Resource Types and Providers

Declarative languages don’t implement the declared state with magic. Puppet’s model uses a resource type to model an object, and a resource provider to implement the state the model describes. To map this to the previous example, when an address and time is declared as the implementation, the provider is you. You are capable of navigating to the destination, calling a taxi, or flinging yourself out the window as necessary to reach that state.

Tip
Resource providers are backends that implement support for a specific implementation of a given resource type.

The major limitation imposed by Puppet’s declarative model is obvious. If a resource provider doesn’t exist for the resource type, Puppet won’t be able to ensure the modeled state. Declaring that you want a red two-story house with four bedrooms won’t accomplish anything if a construction engineer (the provider) isn’t available to implement it.

There is some good news on this front, however, because Puppet includes types and providers for common operating system (OS) resources such as users, groups, services, and packages. Further, the Puppet community and product vendors contribute additional models for add-on software. If you are considering using the exec resource type, look around and make sure someone hasn’t already created a declarative type and provider to handle it.

Procedural Example

Let’s examine some common procedural code intended for user management. We later discuss how to replace this code with a robust, self-healing equivalent in Puppet.

Imperitive/procedural code

Example 1-1 presents an imperative process implemented using a Bash script. It creates a user and installs an Secure Shell (SSH) public key for authentication on a RedHat–family Linux host.

Example 1-1. Imperative user creation in Bash
# groupadd examplegroup
# useradd -g examplegroup alice
# mkdir ~alice/.ssh/
# chown alice:examplegroup ~alice/.ssh
# chmod 770 ~alice/.ssh
# echo "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TA...EnMSuFrD/E5TunvRHIczaI9Hy0IMXc= \
alice@localhost" > ~alice/.ssh/authorized_keys

What if we decide that this user should also be a member of the wheel group? Let’s try it:

# useradd -g examplegroup alice
# usermod -G wheel alice

And if we want to remove that user and that user’s group? Let’s give that a try:

# userdel alice
# groupdel examplegroup

Notice a few things about this example:

  • The correct process to use depends on the current state of the user.

  • Each process is different.

  • Each process will produce errors if invoked more than one time.

Imagine for a second that we have several types of nodes:

  • On some nodes, the Alice user is absent.

  • On some nodes, Alice is present but not a member of the wheel group.

  • On some nodes, Alice is present and a member of the wheel group.

  • On some nodes, Alice is present but does not have an SSH key installed.

  • On some nodes Alice is present but does not have the correct SSH key installed.

  • ...and so on.

Imagine that we need to write a script to ensure that Alice exists, is a member of the wheel group on every system, and has the correct SSH authentication key. Example 1-2 illustrates what such a script would look like.

Example 1-2. Robust user management with BASH
#!/bin/bash

if ! getent group examplegroup; then
  groupadd examplegroup
fi

if ! getent passwd alice; then
  useradd -g examplegroup -G wheel alice
fi

if ! id -nG alice | grep -q 'examplegroup wheel'; then
  usermod -g examplegroup -G wheel alice
fi

if ! test -d ~alice/.ssh; then
  mkdir -p ~alice/.ssh
fi

chown alice:examplegroup ~alice/.ssh

chmod 770 ~alice/.ssh

if ! grep -q alice@localhost ~alice/.ssh/authorized_keys; then
    echo "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TAgMF...AkxEnMsu\
    FrD/E5TunvRHIczaI9Hy0IMXc= alice@localhost" >> ~alice/.ssh/authorized_keys
fi

That’s quite a bit of code and it’s very specific to the current needs. This example covers only the use case of creating and managing two properties about the user. If our needs changed, we would need to write a much larger script to manage the user. Even fairly simple changes, such as revoking this user’s wheel access, could require significant changes.

This approach has another major disadvantage: it works only on RedHat–based platforms that implement the same commands and arguments. Entirely different scripts would need to be written for other Linux platforms, Solaris, FreeBSD, MacOS, and Windows.

Declarative replacement

Let’s look at our user management example using Puppet’s declarative DSL. Example 1-3 presents the code.

Example 1-3. Declarative user creation with Puppet
group { 'examplegroup':
  ensure => 'present',
}

user { 'alice':
  ensure     => 'present',
  gid        => 'examplegroup',
  managehome => true,
}

ssh_authorized_key { 'alice@localhost':
  ensure => 'present',
  user   => 'alice',
  type   => 'ssh-rsa',
  key    => 'AAAAB3NzaC1yc2EAAAABIwAAAIEAm3TAgMF/2RY+r7...vRHIczaI9Hy0IMXc='
}

Adding alice to the wheel group requires changing only one attribute of the user, as shown here (bolded in the example):

user { 'alice':
  ensure     => 'present',
  gid        => 'examplegroup',
  groups     => 'wheel',
  managehome => true,
}

Likewise, removing the example group and alice requires changing only one attribute of each:

group { 'examplegroup':
  ensure  => 'absent',
  require => User['alice'],
}

user { 'alice':
  ensure     => 'absent',
  gid        => 'examplegroup',
  groups     => 'wheel',
  managehome => true,
}
Warning

This was a simplified example. Compliance needs usually require disabling accounts rather than removing them to avoid user identifier (UID) reuse and preserve audit history.

In this example, we are able to remove the user by changing the ensure state from present to absent on the user’s resources.

Gain by replacing procedural code

Our procedural example showed a long script that could handle only a few attributes of a single user on a Red Hat Linux system. Each step of the example script had to analyze the current state to determine what action to take.

The much simpler resource declaration removes the need to compare each individual attribute as a new decision point. And it would work just fine on Ubuntu, a BSD Unix, or Windows. Perhaps most important, every line of the declarative example is focused on the data rather than how to do it. To add another user who is a member of two groups, we would need to add only the group to the groups attribute.

Tip

Rather than managing the user Alice as three unique resources, abstract this into a defined type that implements the related resources based on input parameters.

Nondeclarative Code with Puppet

Most people use the operating system’s package provider (Yum, Apt, MSI, etc.) for consistency and reporting. But they use it most of all for convenience. When they are required to install a package outside of that system, they often revert to an imperative process that does the steps they would have followed. Example 1-4 shows it is possible to write nondeclarative code with Puppet. But that doesn’t mean that this won’t hurt.

Example 1-4. Nondeclarative application installation
$app_source  = 'http://www.example.com/application-1.2.3.tar.gz'
$app_tarball  = '/tmp/application.tar.gz'

exec { 'download application':
  command => "/usr/bin/wget -q ${app_source} -O ${app_tarball}",
  creates => '/usr/local/application/',
  notify  => exec['extract application'],
}

exec { 'extract application':
  command     => "/bin/tar -zxf ${app_tarball} -C /usr/local",
  refreshonly => true,
  creates     => '/usr/local/application/',
}

Example 1-4 shows a common example of nondeclarative Puppet code, often used to handle software unavailable in a native packaging format. This example has a few major problems:

  • exec resources have a set timeout. This example might work well over a fast corporate network connection, but fail completely from a home DSL line. The solution would be to set the timeout attribute of the exec resources to a reasonably high value.

  • This example does not validate the checksum of the downloaded file, which could produce some odd results upon extraction. We might use an additional exec resource to test and correct for this case automatically.

  • A partial or corrupted download can wedge this process. You can work around this problem by overwriting the archive each time it is downloaded.

  • This example makes several assumptions about the contents of application.tar.gz. If any of those assumptions are wrong, these commands will repeat every time Puppet is invoked.

  • This example is not portable and would require a platform-specific implementation for each supported OS.

  • This example would not be useful for upgrading the application.

Another common pattern is the use of conditional logic and custom facts to test for the presence of software. Example 1-5 looks quick and easy, but it is fragile and leaves Puppet completely unaware of the state of the system.

Example 1-5. Nondeclarative application install
# ensure custom app version 1.2.3
if $facts['custom_app_version'] != '1.2.3' {
  exec { 'download application':
    command => "/usr/bin/wget -q ${app_source} -O ${app_tarball}",
  }

  exec { 'extract application':
    command => "/bin/tar -zxf /tmp/${app_tarball} -C /usr/local",
    require => Exec['download application'],
  }
}

This particular example has the same problems as the previous example and introduces a new problem: it breaks Puppet’s reporting and auditing model. The conditional logic happens during the catalog build process and thus removes the application resources from the catalog following initial installation. Puppet can no longer validate nor report on the state of those resources.

This approach is also sensitive to version issues in so much as future versions might have differing needs. This will lead to an imperative testing if/then/else tree, as shown in the Bash script in Example 1-1.

Warning
An if or unless block adds or removes resources from the catalog entirely. Conditional evaluation is performed during the catalog build, and thus invisible to the Puppet agent implementing the changes.

This implementation will fail in unexpected ways; for example, not installing the software when expected to or continually reinstalling it when using the cached catalog. Using cached catalogs is a recommended and often-used practice to minimize disruption in production environments.

It is much better to let the resource provider determine compliance rather than trying to make compliance decisions by adding or removing things from the catalog. This allows the providers to report the current state and determine any necessary action. Consider this enabling the driver to adjust for traffic conditions. Example 1-6 presents a declarative example using the puppet/archive module that doesn’t require a custom fact to gather the application version.

Example 1-6. Declarative, provider-implemented application installation
$app_source  = "http://www.example.com/application-${app_version}.tar.gz"
$app_tarball  = '/tmp/application.tar.gz'
$app_version_canary = "/usr/local/application/README.v${app_version}"
# canary contains a file that exists only in that version

archive { 'custom_application':
  path         => $app_tarball,
  source       => $app_source,
  extract      => true,
  extract_path => '/usr/local/',
  creates      => $app_version_canary,
  cleanup      => true,
}

As you can see, the simple declarative version uses of all the same data, without attempting to imperatively describe how to reach the end goal. This resource’s state will be tracked and reported by Puppet, making it visible for orchestration and reporting tools in the Puppet ecosystem.

Idempotency

In computer science, an idempotent function returns the same value each time it’s called, whether that happens once or 100 times. For example, X = 1 is an idempotent operation. X = X + 1 is a nonidempotent, impactful operation.

The Puppet language was designed to be idempotent. A large part of this idempotency is owed to its declarative resource type model; however, Puppet also enforces a number of rules on variable handling, iterators, and conditional logic to maintain idempotency.

Idempotence has major benefits for a configuration management language:

  • The configuration is inherently self-healing

  • State does not need to be maintained between invocations

  • Configurations can be safely reapplied

For example, if for some reason Puppet fails part way through a configuration run, reinvoking Puppet will complete the run and repair any configurations that were left in an inconsistent state by the previous run.

Side Effects

In computer science, a side effect is a change of system or program state that is outside the defined scope of the original operation. Declarative and idempotent languages usually attempt to manage, reduce, and eliminate side effects. With that said, it is entirely possible for an idempotent operation to have side effects.

Puppet attempts to limit side effects but does not eliminate them by any means; doing so would be nearly impossible given Puppet’s role as a system management tool.

Some side effects are designed into the system. For example, every resource will generate a notification upon changing a resource state that can be consumed by other resources. The notification is used to restart services in order to ensure that the running state of the system reflects the configured state. The filebucket operation that stores a backup of a file modified by Puppet is an intentional and beneficial side effect.

Some side effects are unavoidable. Every access to a file on disk will cause that file’s atime to be incremented unless the entire filesystem is mounted with the noatime attribute. This is of course true whether or not Puppet is being invoked in noop mode.

Resource-Level Idempotence

Many common tasks are not idempotent by nature, and will either throw an error or produce undesirable results if invoked multiple times.

The following code is not idempotent because it will set a state the first time, and throw an error each time it’s subsequently invoked, as shown here:

$ sudo useradd alice
$ sudo useradd alice
useradd: user 'alice' already exists

This code is not idempotent because it will add undesirable duplicate host entries each time it’s invoked:

echo '127.0.0.1 example.localdomin' >> /etc/hosts

The following code is idempotent, but will probably have undesirable results:

echo '127.0.0.1 example.localdomin' > /etc/hosts

To make our example idempotent without clobbering /etc/hosts, we can add a simple check before modifying the file:

grep -q '^127.0.0.1 example.localdomin$' /etc/hosts \
    || echo '127.0.0.1 example.localdomin' >> /etc/hosts

The same example is simple to write in a declarative and idempotent way using the host resource:

host { 'example.localdomain':
  ip => '127.0.0.1',
}

In this example, the resource is modeled in a declarative way and is idempotent by its very nature. Under the hood, Puppet handles the complexity of determining whether the line already exists and how it should be inserted into the underlying file. Using the host resource, Puppet also determines what file should be modified and where that file is located. The preceding declarative example will work on Windows and MacOS, for example.

The idempotent examples are safe to run as many times as you like. This is a huge benefit in large environments; when trying to apply a change to thousands of hosts, it’s relatively common for failures to occur on a small subset of the hosts being managed. Perhaps the host is down during deployment? Perhaps you experienced some sort of transmission loss or timeout when deploying a change? If you are using an idempotent language or process to manage your systems, you can run the process repeatedly on the entire infrastructure with impact only on the exceptional cases that didn’t converge the first time.

When working with Puppet resources, you typically don’t need to worry about idempotence; most resource providers are idempotent by design. A couple of notable exceptions to this statement are the exec and augeas resources. We explore those in depth in Chapter 5.

Puppet does however attempt to track whether a resource has changed state. This is part of Puppet’s reporting mechanism and used to determine whether a signal should be sent to resources with a notify relationship. Because Puppet tracks whether a resource has made a change, it’s entirely possible to write code that is functionally idempotent without meeting the criteria of idempotent from Puppet’s resource model.

The code example that follows is functionally idempotent: this command will always result in the same state. However, because the exec resource knows only whether or not a command succeeded—not the resulting configuration created—it will report a state change with every Puppet run. This will make monitoring convergence reports for unexpected changes more difficult.

exec { 'grep -q /bin/bash /etc/shells || echo /bin/bash >> /etc/shells':
  path     => '/bin',
  provider => 'shell',
}

Puppet’s idempotence model relies on a special aspect of its resource model. For every resource, Puppet first determines that resource’s current state. If the current state does not match the defined state of that resource, Puppet invokes the appropriate methods on the resource’s provider to bring the resource into conformity with the desired state. This usually happens in a transparent manner; however, there are a few exceptions that we discuss in their respective chapters. Understanding these cases is critical in order to avoid breaking Puppet’s simulation and reporting models.

The following example uses state verification to prevent the resource from being converged unnecessarily:

exec { 'echo /bin/bash >> /etc/shells':
  path   => '/bin',
  unless => 'grep -q /bin/bash /etc/shells',
}

In this example, the unless attribute provides a condition Puppet can use to determine whether a change actually needs to take place.

Tip

Using the state verification attributes unless and onlyif properly will help reduce the risk of exec resources. We will explore this in Chapter 5.

An even better implementation would be to forego exec in favor of a resource type that would manage the state for us. For example, we could implement this using the file_line resource from the puppetlabs/stdlib module:

file_line { 'bash_in_shells':
  path => '/etc/shells',
  line => '/bin/bash',
}

Do you see how much simpler and easier to read it is when you no longer need to describe how to accomplish the change?

A final surprising example of unintended side effects is the notify resource, used to produce debugging information and log entries:

notify { 'example':
  message  => 'Danger, Will Robinson!'
}

The notify resource generates an alert every time it is invoked and will always report as a change in system state. As such, you should use notify resources only to bring attention to exception cases, not for general-purpose information. If you’d like to display informational messages without causing a change report, you would need to use alternatives like the ipcrm/echo module on the Puppet Forge.

Run-Level Idempotence

Puppet is designed to be idempotent both at the resource level and at the run level. Much like resource idempotence means that a resource applied twice produces the same result, run-level idempotence means that invoking Puppet multiple times on a host will be safe. Puppet’s default agent run model performs a state evaluation every 30 minutes, and is widely used in live production environments.

Tip

You can run Puppet in nonenforcing --noop (no-operation) mode so as to only report on variance from the model in change-sensitive production environments.

Run-level idempotence is a place where Puppet’s idea of change is just as important as whether the resources are functionally idempotent. Remember that before performing any configuration change, Puppet must first determine whether the resource currently conforms to policy. Puppet will make a change only if resources state doesn’t match the declaration. If Puppet does not report having made any changes, the practical implication is that the resources were already in the desired state.

Tip

Determining where your Puppet runs are not idempotent is fairly simple: If Puppet reports a change on an immediately successive invocation, that change identifies Puppet code that fails to be idempotent.

Because changes to a Puppet resource can have side effects that affect other resources, Puppet’s idempotence model can be broken if we don’t carefully declare resource dependencies, as illustrated here:

package { 'httpd':
  ensure => 'installed',
}

file { '/etc/httpd/conf/httpd.conf':
  ensure => 'file',
  content => template('apache/httpd.conf.erb'),
}

Package['httpd'] -> File['/etc/httpd/conf/httpd.conf']

The file resource will not create paths recursively. In the preceding example, the httpd package must be installed before the httpd.conf file resource is enforced, which depends on the existence of the /etc/httpd/conf/ directory, which is only present after the httpd package has been installed. If these dependencies are not managed, the file resource becomes nonidempotent; upon first invocation of Puppet it might throw an error, and enforce the state of httpd.conf only upon subsequent invocations of Puppet.

Such issues will render Puppet eventually convergent. Because Puppet typically runs on a 30-minute interval, eventually convergent infrastructures can take a very long time to reach a converged state.

Nondeterministic Code

As a general rule, the Puppet DSL is deterministic, meaning that a given set of inputs (manifests, facts, exported resources, etc.) will always produce the same output with no variance.

For example, the language does not implement a random() function; instead, a fqdn_rand() function is provided that returns predictable and repeatable random values derived from a static seed (the node’s fully qualified domain name). This function is neither cryptographically secure nor random. It produces the same random-ish number every time it is called for the same node. This is commonly used to distribute the start times of load-intensive tasks.

Nondeterministic code can pop up in strange places with Puppet. A common cause of nondeterministic code pops-ups is when our code is dependent on a transient or changeable value.

The following code uses the Puppet server’s name in the file:

file { '/tmp/example.txt':
  ensure  => 'file',
  content => "# File managed by Puppet, server ${::servername}\n",
}

This code will not be idempotent when used with a cluster of Puppet Servers. The value of $::servername changes depending on which server compiles the catalog for a particular run. This means that change reports won’t differentiate between nodes with unexpected changes and those that received a different Puppet server from the compile cluster but had no changes to their resources.

For each invocation of Puppet with nondeterministic code, some resources will change. The node catalog will converge, but it will always report your systems as having been brought into conformity with its policy rather than being conformant. This makes it virtually impossible to determine whether changes would be made to a node. Unexpected changes are far easier to pick out when there are only a handful of change reports instead of hundreds or thousands of false positives.

Nondeterministic code that utilizes subscribe or notify can create unintended side effects. If a resource is converged that notifies or is subscribed to by a service, the service will restart. This can cause unintended service disruption.

Stateless

A stateless interface does not preserve state between requests; each request is completely independent from previous requests.

Note

Puppet uses a RESTful API over HTTPS for client–server communications.

Puppet’s agent/server API is stateless. The catalog build process does not consult data from a previous request to produce a new catalog for the node. Unless state information is provided by custom node facts, node classifiers, or Hiera lookups, node catalogs are built in a completely stateless, deterministic manner.

Benefits of Stateless Design

Tip
This section discusses implementations in which the Puppet Server (master) process builds the catalog for the node. The catalog build process is identical in every way for serverless Puppet implementations, except that puppet apply builds the catalog locally.

Puppet uses the facts supplied by the node to build a catalog for it. The catalog building process doesn’t know or care whether this is the first time it has generated a catalog for this node, whether the last run was successful, or if any change occurred on the node during the last run. The node’s catalog is built from scratch every time the node requests a catalog. The catalog builder (master) creates the model. The Puppet agent on the node has the responsibility of comparing the current state of the node to the catalog and applying change as necessary to match the model.

Keeping Puppet stateless can be tremendously useful. In a stateless system, there is no need to synchronize data or resolve conflicts between masters. There is no locking to worry about. There is no need to design a partition-tolerant system in case you lose a datacenter or uplink, and there’s no need to worry about clustering strategies. You can easily distribute load across a pool of masters by using a load balancer or DNS SRV record.

It is possible to track state on the agent, and submit state information to the master using Puppet facts. The catalog build is customized with the values provided by facts. There are cases for which security requirements or particularly idiosyncratic software will necessitate creating custom facts to manage bespoke dependencies with Puppet’s DSL.

If you keep your code declarative, it’s easy to work with Puppet’s stateless agent/server configuration model. If a manifest declares that a user resource should exist, the catalog builder is not concerned with the current state of that resource. The catalog declares the desired state, and the Puppet agent enforces that state.

Puppet’s stateless model has several major advantages over a stateful model:

  • Lack of state allows Puppet servers to scale horizontally.

  • Catalogs can be compared.

  • Catalogs can be cached locally to reduce server load.

It is worth noting that there are a few stateful features of Puppet. It’s important to weigh the value of these features against the cost of making your Puppet infrastructure stateful, and to design your infrastructure to provide an acceptable level of availability and fault tolerance. We discuss how to approach each of these technologies in upcoming chapters, but we provide a quick overview here.

Sources of State Information

In the beginning of this section, we mentioned that a few features can provide state information for use in the Puppet catalog build. Let’s look at some of these features in a bit more depth.

Filebuckets

Filebuckets provide history of changes to files by Puppet, an under-appreciated feature of the file resource. If a filebucket is configured, the file provider will create a backup copy of a file before changes are made. You can store the backup locally, or you can submit it to a server-based bucket.

Bucketing your files is useful for keeping backups, auditing, reporting, and disaster recovery. It can be immensely useful for restoring a file broken by a Puppet change, or to compare when testing changes to Puppet resources. Both command-line utilities and web consoles can display and compare the contents of files stored in the bucket.

Exported resources

Exported resources provide a simple service discovery mechanism for Puppet. When a Puppet master builds a catalog, resources can be marked as exported by the compiler. After the resources are marked as exported, they are recorded in Puppet’s database, PuppetDB. Other nodes can then collect the exported resources and apply those resources locally. Exported resources persist until they are overwritten or purged by a later Puppet run on the same node. Be aware that exported resources introduce a source of state into your infrastructure.

In this example, a pool of web servers export their pool membership information to a HAproxy load balancer using the puppetlabs/haproxy module and exported resources.

Each node would create an exported resource with its own information, as demonstrated here:

# Export a balance member resource with our details
@@haproxy::balancermember { $::fqdn:
  listening_service => 'web',
  server_names      => $::hostname,
  ipaddresses       => $::ipaddress,
  ports             => '80',
  options           => 'check',
}

The load balancer collects all of these balance members to create the service pool:

# Define the frontend listener
haproxy::listen { 'web':
  ipaddress => $::ipaddress,
  ports     => '80',
}

# collect all exported backends for the service
Haproxy::Balancermember <<| listening_service == 'web' |>> 
Tip
The preceding example shows a fairly safe use of exported resources. If PuppetDB became unavailable the pool would continue to work. New nodes would be added when PuppetDB was available again.

Exported resources rely on PuppetDB, which stores the data in a PostgreSQL database. Although there are several methods available to make PuppetDB fault tolerant, including the Puppet Enterprise high availability configuration, the use of exported resources does introduce into the infrastructure a dependency and possible point of failure.

Hiera

Hiera is by design a pluggable system. By default, it provides YAM, JSON, and HOCON backends, all of which are stateless. However, it is possible to design a custom backend for Hiera that sources a database or inventory service, such as PuppetDB. If you use this approach, it can introduce a source of state information for your Puppet infrastructure. We explore Hiera in depth in Chapter 6.

Inventory and reporting

Puppet Server stores a considerable amount of information pertaining to the state of each node. This information includes the facts supplied by the node, the catalog last sent to the node, and the convergence reports produced by each Puppet application on the node. Even though this information is stateful, it is not typically consumed when building the node’s catalog. We take a close look at inventory and reporting services in “Inventory and Infrastructure Management ENCs”.

There are Puppet extensions that provide inventory information for use during the catalog build; however, these are not core to Puppet.

Custom facts

Custom facts can provide current node state information for use in Puppet manifests. Facts provided by the node avoid creating scaling and availability problems inherent in server-side state storage and management.

Example 1-5 showed how the use of custom facts for conditional inclusion can create nondeclarative, fragile code.

Summary

In this chapter, we reviewed how Puppet’s declarative design provides simplicity, flexibility, and power. We also reviewed how declarative design requires new approaches, and restricts the usage of simple imperative approaches.

Here are some takeaways from this chapter:

  • Puppet is declarative, idempotent, and stateless by default.

  • In some cases violation of these design ideals is unavoidable.

  • You should write declarative, idempotent, and stateless code whenever possible.

Each chapter in this book provides concrete recommendations for the effective usage of Puppet’s language, resource types, and providers. Building code that uses Puppet’s declarative model will be the major driving force behind each topic in future chapters.

Get Puppet Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.