Chapter 4. Inventory: Describing Your Servers

So far, we’ve been working with only one server (or host, as Ansible calls it). The simplest inventory is a comma-separated list of hostnames, which you can do even without a server:

$ ansible all -i 'localhost,' -a date

In reality, you’re going to be managing multiple hosts. The collection of hosts that Ansible knows about is called the inventory. In this chapter, you will learn how to describe a set of hosts as an Ansible inventory by creating an inventory that contains multiple machines.

Your ansible.cfg file should look like Example 4-1, which enables all inventory plug-ins explicitly.

Example 4-1. ansible.cfg
[defaults]
inventory = inventory

[inventory]
enable_plugins = host_list, script, auto, yaml, ini, toml

In this chapter, we will use a directory named inventory for the inventory examples. The Ansible inventory is a very flexible object: it can be a file (in several formats), a directory, or an executable, and some executables are bundled as plug-ins. Inventory plug-ins allow us to point at data sources, like your cloud provider, to compile the inventory. An inventory can be stored separately from your playbooks. This means that you can create one inventory directory to use with Ansible on the command line, with hosts running in Vagrant, Amazon EC2, Google Cloud Platform, or Microsoft Azure, or wherever you like!

Note

Serge van Ginderachter is the most knowledgeable person to read on Ansible inventory. See his blog for in-depth details.

Inventory/Hosts Files

The default way to describe your hosts in Ansible is to list them in text files, called inventory hosts files. The simplest form is just a list of hostnames in a file named hosts, as shown in Example 4-2.

Example 4-2. A very simple inventory file
frankfurt.example.com
helsinki.example.com
hongkong.example.com
johannesburg.example.com
london.example.com
newyork.example.com
seoul.example.com
sydney.example.com

Ansible automatically adds one host to the inventory by default: localhost. It understands that localhost refers to your local machine, with which it will interact directly rather than connecting by SSH.

Preliminaries: Multiple Vagrant Machines

To talk about inventory, you’ll need to interact with multiple hosts. Let’s configure Vagrant to bring up three hosts. We’ll unimaginatively call them vagrant1, vagrant2, and vagrant3.

Before you create a new Vagrantfile for this chapter, make sure you destroy your existing virtual machine(s) by running the following:

$ vagrant destroy --force

If you don’t include the --force option, Vagrant will prompt you to confirm that you want to destroy each virtual machine listed in the Vagrantfile.

Next, create a new Vagrantfile that looks like Example 4-3.

Example 4-3. Vagrantfile with three servers
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # Use the same key for each machine
  config.ssh.insert_key = false

  config.vm.define "vagrant1" do |vagrant1|
    vagrant1.vm.box = "ubuntu/focal64"
    vagrant1.vm.network "forwarded_port", guest: 80, host: 8080
    vagrant1.vm.network "forwarded_port", guest: 443, host: 8443
  end
  config.vm.define "vagrant2" do |vagrant2|
    vagrant2.vm.box = "ubuntu/focal64"
    vagrant2.vm.network "forwarded_port", guest: 80, host: 8081
    vagrant2.vm.network "forwarded_port", guest: 443, host: 8444
  end
  config.vm.define "vagrant3" do |vagrant3|
    vagrant3.vm.box = "centos/stream8"
    vagrant3.vm.network "forwarded_port", guest: 80, host: 8082
    vagrant3.vm.network "forwarded_port", guest: 443, host: 8445
  end
end

Vagrant, from version 1.7 on, defaults to using a different SSH key for each host. Example 4-3 contains the line to revert to the earlier behavior of using the same SSH key for each host:

config.ssh.insert_key = false

Using the same key on each host simplifies our Ansible setup because we can specify a single SSH key in the configuration.

For now, let’s assume that each of these servers can potentially be a web server, so Example 4-3 maps ports 80 and 443 inside each Vagrant machine to a port on the local machine.

We should be able to bring up the virtual machines by running the following:

$ vagrant up

If all goes well, the output should look something like this:

Bringing machine 'vagrant1' up with 'virtualbox' provider...
Bringing machine 'vagrant2' up with 'virtualbox' provider...
Bringing machine 'vagrant3' up with 'virtualbox' provider...
...
    vagrant1: 80 (guest) => 8080 (host) (adapter 1)
    vagrant1: 443 (guest) => 8443 (host) (adapter 1)
    vagrant1: 22 (guest) => 2222 (host) (adapter 1)
==> vagrant1: Running 'pre-boot' VM customizations...
==> vagrant1: Booting VM...
==> vagrant1: Waiting for machine to boot. This may take a few minutes...
    vagrant1: SSH address: 127.0.0.1:2222
    vagrant1: SSH username: vagrant
    vagrant1: SSH auth method: private key
==> vagrant1: Machine booted and ready!
==> vagrant1: Checking for guest additions in VM...
==> vagrant1: Mounting shared folders...
    vagrant1: /vagrant => /Users/bas/code/ansible/ansiblebook/ansiblebook/ch03

Next, we need to know what ports on the local machine map to the SSH port (22) inside each VM. Recall that we can get that information by running the following:

$ vagrant ssh-config

The output should look something like this:

Host vagrant1
  HostName 127.0.0.1
  User vagrant
  Port 2222
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile /Users/lorin/.vagrant.d/insecure_private_key
  IdentitiesOnly yes
  LogLevel FATAL
Host vagrant2
  HostName 127.0.0.1
  User vagrant
  Port 2200
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile /Users/lorin/.vagrant.d/insecure_private_key
  IdentitiesOnly yes
  LogLevel FATAL
Host vagrant3
  HostName 127.0.0.1
  User vagrant
  Port 2201
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile /Users/lorin/.vagrant.d/insecure_private_key
  IdentitiesOnly yes
  LogLevel FATAL

A lot of the ssh-config information is repetitive and can be reduced. The information that differs per host is that vagrant1 uses port 2222, vagrant2 uses port 2200, and vagrant3 uses port 2201.

Ansible uses your local SSH client by default, which means that it will understand any aliases that you set up in your SSH config file. Therefore, we use a wildcard alias in the file ~/.ssh/config:

Host vagrant*
  Hostname 127.0.0.1
  User vagrant
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile ~/.vagrant.d/insecure_private_key
  IdentitiesOnly yes
  LogLevel FATAL

Modify your inventory/hosts file so it looks like this:

vagrant1 ansible_port=2222
vagrant2 ansible_port=2200
vagrant3 ansible_port=2201

Now, make sure that you can access these machines. For example, to get information about the network interface for vagrant2, run the following:

$ ansible vagrant2 -a "ip addr show dev enp0s3"

Your output should look something like this:

vagrant2 | CHANGED | rc=0 >>
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP 
group default qlen 1000
    link/ether 02:1e:de:45:2c:c8 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 86178sec preferred_lft 86178sec
    inet6 fe80::1e:deff:fe45:2cc8/64 scope link
       valid_lft forever preferred_lft forever

Behavioral Inventory Parameters

To describe our Vagrant machines in the Ansible inventory file, we had to explicitly specify the port (2222, 2200, or 2201) to which Ansible’s SSH client should connect. Ansible calls such variables behavioral inventory parameters, and there are several of them you can use when you need to override the Ansible defaults for a host (see Table 4-1).

Table 4-1. Behavioral inventory parameters
Name Default Description
ansible_host Name of host Hostname or IP address to SSH to
ansible_port 22 Port to SSH to
ansible_user $USER User to SSH as
ansible_password (None) Password to use for SSH authentication
ansible_connection smart How Ansible will connect to host (see the following section)
ansible_ssh_private_key_file (None) SSH private key to use for SSH authentication
ansible_shell_type sh Shell to use for commands (see the following section)
ansible_python_interpreter /usr/bin/python Python interpreter on host (see the following section)
ansible_*_interpreter (None) Like ansible_python_interpreter for other languages (see the following section)

For some of these options, the meaning is obvious from the name, but others require more explanation:

ansible_connection

Ansible supports multiple transports, which are mechanisms that Ansible uses to connect to the host. The default transport, smart, will check whether the locally installed SSH client supports a feature called ControlPersist. If the SSH client supports ControlPersist, Ansible will use the local SSH client. If not, the smart transport will fall back to using a Python-based SSH client library called Paramiko.

ansible_shell_type

Ansible works by making SSH connections to remote machines and then invoking scripts. By default, Ansible assumes that the remote shell is the Bourne shell located at /bin/sh, and will generate the appropriate command-line parameters that work with that. It creates temporary directories to store these scripts.

Ansible also accepts csh, fish, and (on Windows) powershell as valid values for this parameter. Ansible doesn’t work with restricted shells.

ansible_python_interpreter

Ansible needs to know the location of the Python interpreter on the remote machine. You might want to change this to choose a version that works for you. The easiest way to run Ansible under Python 3 is to install it with pip3 and set this:

ansible_python_interpreter="/usr/bin/env python3"
ansible_*_interpreter

If you are using a custom module that is not written in Python, you can use this parameter to specify the location of the interpreter (such as /usr/bin/ruby). We’ll cover this in Chapter 12.

Changing Behavioral Parameter Defaults

You can override some of the behavioral parameter default values in the inventory file, or you can override them in the defaults section of the ansible.cfg file (Table 4-2). Consider where you change these parameters. Are the changes a personal choice, or does the change apply to your whole team? Does a part of your inventory need a different setting? Remember that you can configure SSH preferences in the ~/.ssh/config file.

Table 4-2. Defaults that can be overridden in ansible.cfg
Behavioral inventory parameter ansible.cfg option
ansible_port remote_port
ansible_user remote_user
ansible_ssh_private_key_file ssh_private_key_file
ansible_shell_type executable (see the following paragraph)

The ansible.cfg executable config option is not exactly the same as the ansible_shell_type behavioral inventory parameter. The executable specifies the full path of the shell to use on the remote machine (for example, /usr/local/bin/fish). Ansible will look at the base name of this path (in this case fish) and use that as the default value for ansible_shell_type.

Groups and Groups and Groups

We typically want to perform configuration actions on groups of hosts, rather than on an individual host. Ansible automatically defines a group called all (or *), which includes all the hosts in the inventory. For example, we can check whether the clocks on the machines are roughly synchronized by running the following:

$ ansible all -a "date"

or

$ ansible '*' -a "date"

The output on Bas’s system looks like this:

vagrant2 | CHANGED | rc=0 >>
Wed 12 May 2021 01:37:47 PM UTC
vagrant1 | CHANGED | rc=0 >>
Wed 12 May 2021 01:37:47 PM UTC
vagrant3 | CHANGED | rc=0 >>
Wed 12 May 2021 01:37:47 PM UTC

We can define our own groups in the inventory hosts file. Ansible uses the .ini file format for inventory hosts files; it groups configuration values into sections.

Here’s how to specify that our vagrant hosts are in a group called vagrant, along with the other example hosts mentioned at the beginning of the chapter:

frankfurt.example.com
helsinki.example.com
hongkong.example.com
johannesburg.example.com
london.example.com
newyork.example.com
seoul.example.com
sydney.example.com

[vagrant]
vagrant1 ansible_port=2222
vagrant2 ansible_port=2200
vagrant3 ansible_port=2201

We could alternately list the Vagrant hosts at the top and then also in a group, like this:

frankfurt.example.com
helsinki.example.com
hongkong.example.com
johannesburg.example.com
london.example.com
newyork.example.com
seoul.example.com
sydney.example.com
vagrant1 ansible_port=2222
vagrant2 ansible_port=2200
vagrant3 ansible_port=2201

[vagrant]
vagrant1
vagrant2
vagrant3

You can use groups in any way that suits you: they can overlap or be nested, however you like. The order does not matter, except for human readability.

Example: Deploying a Django App

Imagine you’re responsible for deploying a Django-based web application that processes long-running jobs. The app needs to support the following services:

  • The actual Django web app itself, run by a Gunicorn HTTP server

  • A NGINX web server, which will sit in front of Gunicorn and serve static assets

  • A Celery task queue that will execute long-running jobs on behalf of the web app

  • A RabbitMQ message queue that serves as the backend for Celery

  • A Postgres database that serves as the persistent store

In later chapters, we will work through a detailed example of deploying this kind of Django-based application, although our example won’t use Celery or RabbitMQ. For now, we need to deploy this application into three different environments: production (the real thing), staging (for testing on hosts that our team has shared access to), and Vagrant (for local testing).

When we deploy to production, we want the entire system to respond quickly and reliably, so we do the following:

  • Run the web application on multiple hosts for better performance and put a load balancer in front of them

  • Run task queue servers on multiple hosts for better performance

  • Put Gunicorn, Celery, RabbitMQ, and Postgres all on separate servers

  • Use two Postgres hosts: a primary and a replica

Assuming we have one load balancer, three web servers, three task queues, one RabbitMQ server, and two database servers, that’s 10 hosts we need to deal with (Figure 4-1).

Django application deployment
Figure 4-1. Ten hosts for deploying a Django app

For our staging environment, we want to use fewer hosts than we do in production to save costs, since it’s going to see a lot less activity than production will. Let’s say we decide to use only two hosts for staging; we’ll put the web server and task queue on one staging host, and RabbitMQ and Postgres on the other.

For our local Vagrant environment, we decide to use three servers: one for the web app, one for a task queue, and one that will contain RabbitMQ and Postgres.

Example 4-4 shows a sample inventory file that groups servers by environment (production, staging, Vagrant) and by function (web server, task queue, etc.).

Example 4-4. Inventory file for deploying a Django app
[production]
frankfurt.example.com
helsinki.example.com
hongkong.example.com
johannesburg.example.com
london.example.com
newyork.example.com
seoul.example.com
sydney.example.com
tokyo.example.com
toronto.example.com

[staging]
amsterdam.example.com
chicago.example.com

[lb]
helsinki.example.com

[web]
amsterdam.example.com
seoul.example.com
sydney.example.com
toronto.example.com
vagrant1

[task]
amsterdam.example.com
hongkong.example.com
johannesburg.example.com
newyork.example.com
vagrant2

[rabbitmq]
chicago.example.com
tokyo.example.com
vagrant3

[db]
chicago.example.com
frankfurt.example.com
london.example.com
vagrant3

We could have first listed all of the servers at the top of the inventory file, without specifying a group, but that isn’t necessary, and that would’ve made this file even longer.

Note that we need to specify the behavioral inventory parameters for the Vagrant instances only once.

Aliases and Ports

We have described our Vagrant hosts like this:

[vagrant]
vagrant1 ansible_port=2222
vagrant2 ansible_port=2200
vagrant3 ansible_port=2201

The names vagrant1, vagrant2, and vagrant3 here are aliases. They are not the real hostnames, just useful names for referring to these hosts. Ansible resolves hostnames using the inventory, your SSH config file, /etc/hosts, and DNS. This flexibility is useful in development but can be a cause of confusion.

Ansible also supports using <hostname>:<port> syntax when specifying hosts, so we could replace the line that contains vagrant1 with 127.0.0.1:2222 (Example 4-5).

Example 4-5. This doesn’t work
[vagrant]
127.0.0.1:2222
127.0.0.1:2200
127.0.0.1:2201

However, we can’t actually run what you see in Example 4-5. The reason is that Ansible’s inventory can associate only a single host with 127.0.0.1, so the Vagrant group would contain only one host instead of three.

Groups of Groups

Ansible also allows you to define groups that are made up of other groups. For example, since both the web servers and the task queue servers will need Django and its dependencies, it might be useful to define a django group that contains both. You would add this to the inventory file:

[django:children]
web
task

Note that the syntax changes when you are specifying a group of groups, as opposed to a group of hosts. That’s so Ansible knows to interpret web and task as groups and not as hosts.

Numbered Hosts (Pets Versus Cattle)

The inventory file you saw back in Example 4-4 looks complex. It describes 15 hosts, which doesn’t sound like a large number in this cloudy, scale-out world. However, dealing with 15 hosts in the inventory file can be cumbersome, because each host has a completely different hostname.

Bill Baker of Microsoft came up with the distinction between treating servers as pets versus treating them like cattle.1 We give pets distinctive names and treat and care for them as individuals; with cattle, though, we refer to them by identification number and treat them as livestock.

The “cattle” approach to servers is much more scalable, and Ansible supports it well by supporting numeric patterns. For example, if your 20 servers are named web1.example.com, web2.example.com, and so on, then you can specify them in the inventory file like this:

[web]
web[1:20].example.com

If you prefer to have a leading zero (such as web01.example.com), specify that in the range, like this:

[web]
web[01:20].example.com

Ansible also supports using alphabetic characters to specify ranges. If you want to use the convention web-a.example.com, web-b.example.com, and so on, for your 20 servers, then you can do this:

[web]
web-[a:t].example.com

Hosts and Group Variables: Inside the Inventory

Recall how we can specify behavioral inventory parameters for Vagrant hosts:

vagrant1 ansible_host=127.0.0.1 ansible_port=2222
vagrant2 ansible_host=127.0.0.1 ansible_port=2200
vagrant3 ansible_host=127.0.0.1 ansible_port=2201

Those parameters are  variables that have special meaning to Ansible. We can also define arbitrary variable names and associated values on hosts. For example, we could define a variable named color and set it to a value for each server:

amsterdam.example.com color=red
seoul.example.com color=green
sydney.example.com color=blue
toronto.example.com color=purple

We could then use this variable in a playbook, just like any other variable. Personally, your authors don’t often attach variables to specific hosts. On the other hand, we often associate variables with groups.

Circling back to our Django example, the web application and task queue service need to communicate with RabbitMQ and Postgres. We’ll assume that access to the Postgres database is secured both at the network layer (so only the web application and the task queue can reach the database) and by username and password. RabbitMQ is secured only by the network layer.

To set everything up, you can:

  • Configure the web servers with the hostname, port, username, password of the primary Postgres server, and name of the database.

  • Configure the task queues with the hostname, port, username, password of the primary Postgres server, and the name of the database.

  • Configure the web servers with the hostname and port of the RabbitMQ server.

  • Configure the task queues with the hostname and port of the RabbitMQ server.

  • Configure the primary Postgres server with the hostname, port, and username and password of the replica Postgres server (production only).

This configuration info varies by environment, so it makes sense to define these as group variables on the production, staging, and Vagrant groups. Example 4-6 shows one way to do so in the inventory file. (A better way to store passwords is discussed in Chapter 8).

Example 4-6. Specifying group variables in inventory
[all:vars]
ntp_server=ntp.ubuntu.com
[production:vars]
db_primary_host=frankfurt.example.com
db_primary_port=5432
db_replica_host=london.example.com
db_name=widget_production
db_user=widgetuser
db_password=pFmMxcyD;Fc6)6
rabbitmq_host=johannesburg.example.com
rabbitmq_port=5672
[staging:vars]
db_primary_host=chicago.example.com
db_primary_port=5432
db_name=widget_staging
db_user=widgetuser
db_password=L@4Ryz8cRUXedj
rabbitmq_host=chicago.example.com
rabbitmq_port=5672
[vagrant:vars]
db_primary_host=vagrant3
db_primary_port=5432
db_name=widget_vagrant
db_user=widgetuser
db_password=password
rabbitmq_host=vagrant3
rabbitmq_port=5672

Note how the group variables are organized into sections named [<group name>:vars]. Also, we’ve taken advantage of the all group (which, you’ll recall, Ansible creates automatically) to specify variables that don’t change across hosts.

Host and Group Variables: In Their Own Files

The inventory file is a reasonable place to put host and group variables if you don’t have too many hosts. But as your inventory gets larger, it gets more difficult to manage variables this way. Additionally, even though Ansible variables can hold Booleans, strings, lists, and dictionaries, in an inventory file you can specify only Booleans and strings.

Ansible offers a more scalable approach to keep track of host and group variables: you can create a separate variable file for each host and each group. Ansible expects these variable files to be in YAML format.

It looks for host variable files in a directory called host_vars and group variable files in a directory called group_vars. Ansible expects these directories to be in either the directory that contains your playbooks or the directory adjacent to your inventory file. When you have both directories, then the first (the playbook directory) has priority.

For example, if Lorin has a directory containing his playbooks at /home/lorin/playbooks/ with an inventory directory and hosts file at /home/lorin/inventory/hosts, he should put variables for the amsterdam.example.com host in the file /home/lorin/inventory/host_vars/amsterdam.example.com and variables for the production group in the file /home/lorin/inventory/group_vars/production (shown in Example 4-7).

Example 4-7. group_vars/production
---
db_primary_host: frankfurt.example.com
db_primary_port: 5432
db_replica_host: london.example.com
db_name: widget_production
db_user: widgetuser
db_password: 'pFmMxcyD;Fc6)6'
rabbitmq_host: johannesburg.example.com
rabbitmq_port: 5672
...

We can also use YAML dictionaries to represent these values, as shown in Example 4-8.

Example 4-8. group_vars/production, with dictionaries
---
db:
  user: widgetuser
  password: 'pFmMxcyD;Fc6)6'
  name: widget_production
  primary:
    host: frankfurt.example.com
    port: 5432
  replica:
    host: london.example.com
    port: 5432
rabbitmq:
  host: johannesburg.example.com
  port: 5672
...

If we choose YAML dictionaries, we access the variables with dot notation like this:

"{{ db.primary.host }}"

We can also access the variables in the dictionary like this:

"{{ db['primary']['host'] }}"

Contrast that to how we would otherwise access them:

"{{ db_primary_host }}"

If we want to break things out even further, Ansible lets us define group_vars/production as a directory instead of a file. We can place multiple YAML files into it that contain variable definitions. For example, we could put database-related variables in one file and the RabbitMQ-related variables in another file, as shown in Examples 4-9 and 4-10.

Example 4-9. group_vars/production/db
---
db:
  user: widgetuser
  password: 'pFmMxcyD;Fc6)6'
  name: widget_production
  primary:
    host: frankfurt.example.com
    port: 5432
  replica:
    host: london.example.com
    port: 5432
...
Example 4-10. group_vars/production/rabbitmq
---
rabbitmq:
  host: johannesburg.example.com
  port: 6379
...

It’s often better to start simple, rather than splitting variables out across too many files. In larger teams and projects, the value of separate files increases, since many people might need to pull and work in files at the same time.

Dynamic Inventory

Up until this point, we’ve been explicitly specifying all our hosts in our hosts inventory file. However, you might have a system external to Ansible that keeps track of your hosts. For example, if your hosts run on Amazon EC2, then EC2 tracks information about your hosts for you. You can retrieve this information through EC2’s web interface, its Query API, or command-line tools such as awscli. Other cloud providers have similar interfaces.

If you’re managing your own servers using an automated provisioning system such as Cobbler or Ubuntu Metal as a Service (MAAS), then your system is already keeping track of your servers. Or, maybe you have one of those fancy configuration management databases (CMDBs) where all of this information lives.

You don’t want to manually duplicate this information in your hosts file, because eventually that file will not jibe with your external system, which is the true source of information about your hosts. Ansible supports a feature called dynamic inventory that allows you to avoid this duplication.

If the inventory file is marked executable, Ansible will assume it is a dynamic inventory script and will execute the file instead of reading it.

Note

To mark a file as executable, use the chmod +x command. For example:

$ chmod +x vagrant.py

Inventory Plug-ins

Ansible comes with several executables that can connect to various cloud systems, provided you install the requirements and set up authentication. These plug-ins typically need a YAML configuration file in the inventory directory, as well as some environment variables or authentication files.

To see the list of available plug-ins:

$ ansible-doc -t inventory -l

To see plug-in-specific documentation and examples:

$ ansible-doc -t inventory <plugin name>

Amazon EC2

If you are using Amazon EC2, install the requirements:

$ pip3 install boto3 botocore

Create a file inventory/aws_ec2.yml with, at the very least:

plugin: aws_ec2

Azure Resource Manager

Install these requirements in a Python virtualenv with Ansible 2.9.xx:

$ pip3 install msrest msrestazure

Create a file inventory/azure_rm.yml with, at the very least:

plugin: azure_rm
platform: azure_rm
auth_source: auto
plain_host_names: true

The Interface for a Dynamic Inventory Script

An Ansible dynamic inventory script must support two command-line flags:

  • --host=<hostname> for showing host details

  • --list for listing groups

Also it should return output in JSON format with a specific structure that Ansible can interpret.

Showing host details

To get the details of the individual host, Ansible will call an inventory script with the --host= argument:

$ ansible-inventory -i inventory/hosts --host=vagrant2
Note

Ansible includes a script that functions as a dynamic inventory script for the static inventory provided with the -i command-line argument: ansible-inventory.

The output should contain any host-specific variables, including behavioral parameters, like this:

{
    "ansible_host": "127.0.0.1",
    "ansible_port": 2200,
    "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key",
    "ansible_user": "vagrant"
} 

The output is a single JSON object; the names are variable names, and the values are the variable values.

Listing groups

Dynamic inventory scripts need to be able to list all of the groups and details about the individual hosts. In the GitHub repository that accompanies this book, there is an inventory script for the Vagrant hosts called vagrant.py. Ansible will call it like this to get a list of all of the groups:

$ ./vagrant.py --list

In the simplest form the output could look like this:

{"vagrant": ["vagrant1", "vagrant2", "vagrant3"]}

This output is a single JSON object; the names are Ansible group names, and the values are arrays of hostnames.

As an optimization, the --list command can contain the values of the host variables for all of the hosts, which saves Ansible the trouble of making a separate --host invocation to retrieve the variables for the individual hosts.

To take advantage of this optimization, the --list command should return a key named _meta that contains the variables for each host, in this form:

"_meta": {
    "hostvars": {
      "vagrant1": {
        "ansible_user": "vagrant",
        "ansible_host": "127.0.0.1",
        "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key",
        "ansible_port": "2222"
      },
      "vagrant2": {
        "ansible_user": "vagrant",
        "ansible_host": "127.0.0.1",
        "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key",
        "ansible_port": "2200"
      },
      "vagrant3": {
        "ansible_user": "vagrant",
        "ansible_host": "127.0.0.1",
        "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key",
        "ansible_port": "2201"
      }
    }

Writing a Dynamic Inventory Script

One of the handy features of Vagrant is that you can see which machines are currently running by using the vagrant status command. Assuming we have a Vagrant file that looks like Example 4-3, if we run vagrant status, the output would look like Example 4-11.

Example 4-11. Output of vagrant status
$ vagrant status
Current machine states:

vagrant1                 running (virtualbox)
vagrant2                 running (virtualbox)
vagrant3                 running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run 'vagrant status NAME'.

Because Vagrant already keeps track of machines for us, there’s no need for us to list them in an Ansible inventory file. Instead, we can write a dynamic inventory script that queries Vagrant about which machines are running. Once we’ve set up a dynamic inventory script for Vagrant, even if we alter our Vagrantfile to run different numbers of Vagrant machines, we won’t need to edit an Ansible inventory file.

Let’s work through an example of creating a dynamic inventory script that retrieves the details about hosts from Vagrant. Our dynamic inventory script is going to need to invoke the vagrant status command. The output shown in Example 4-11 is designed for humans to read. We can get a list of running hosts in a format that is easier for computers to parse with the --machine-readable flag, like so:

$ vagrant status --machine-readable

The output looks like this:

1620831617,vagrant1,metadata,provider,virtualbox
1620831617,vagrant2,metadata,provider,virtualbox
1620831618,vagrant3,metadata,provider,virtualbox
1620831619,vagrant1,provider-name,virtualbox
1620831619,vagrant1,state,running
1620831619,vagrant1,state-human-short,running
1620831619,vagrant1,state-human-long,The VM is running. To stop this 
VM%!(VAGRANT_COMMA) you can run `vagrant halt` to\nshut it down 
forcefully%!(VAGRANT_COMMA) or you can run `vagrant suspend` to 
simply\nsuspend the virtual machine. In either case%!(VAGRANT_COMMA) 
to restart it again%!(VAGRANT_COMMA)\nsimply run `vagrant up`.
1620831619,vagrant2,provider-name,virtualbox
1620831619,vagrant2,state,running
1620831619,vagrant2,state-human-short,running
1620831619,vagrant2,state-human-long,The VM is running. To stop this 
VM%!(VAGRANT_COMMA) you can run `vagrant halt` to\nshut it down 
forcefully%!(VAGRANT_COMMA) or you can run `vagrant suspend` to 
simply\nsuspend the virtual machine. In either case%!(VAGRANT_COMMA) 
to restart it again%!(VAGRANT_COMMA)\nsimply run `vagrant up`.
1620831620,vagrant3,provider-name,virtualbox
1620831620,vagrant3,state,running
1620831620,vagrant3,state-human-short,running
1620831620,vagrant3,state-human-long,The VM is running. To stop this 
VM%!(VAGRANT_COMMA) you can run `vagrant halt` to\nshut it down 
forcefully%!(VAGRANT_COMMA) or you can run `vagrant suspend` to 
simply\nsuspend the virtual machine. In either case%!(VAGRANT_COMMA) 
to restart it again%!(VAGRANT_COMMA)\nsimply run `vagrant up`.
1620831620,,ui,info,Current machine states:\n\nvagrant1                  
running (virtualbox)\nvagrant2         running (virtualbox)\nvagrant3
running (virtualbox)\n\nThis environment represents multiple VMs. The VMs 
are all listed\nabove with their current state. For more information about 
a specific\nVM%!(VAGRANT_COMMA) run `vagrant status NAME`

To get details about a particular Vagrant machine, say, vagrant2, we would run this:

$ vagrant ssh-config vagrant2

The output looks like this:

Host vagrant2
  HostName 127.0.0.1
  User vagrant
  Port 2200
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile /Users/lorin/.vagrant.d/insecure_private_key
  IdentitiesOnly yes
  LogLevel FATAL

Our dynamic inventory script will need to call these commands, parse the outputs, and output the appropriate JSON. We can use the Paramiko library to parse the output of vagrant ssh-config. First, install the Python Paramiko library with pip:

$ pip3 install --user paramiko

Here’s an interactive Python session that shows how to use the Paramiko library to do this:

$ python3
>>> import io
>>> import subprocess
>>> import paramiko
>>> cmd = ["vagrant", "ssh-config", "vagrant2"]
>>> ssh_config = subprocess.check_output(cmd).decode("utf-8")
>>> config = paramiko.SSHConfig()
>>> config.parse(io.StringIO(ssh_config))
>>> host_config = config.lookup("vagrant2")
>>> print (host_config)
{'hostname': '127.0.0.1', 'user': 'vagrant', 'port': '2200', 'userknownhostsfile': 
'/dev/null', 'stricthostkeychecking': 'no', 'passwordauthentication': 'no', 
'identityfile': ['/Users/bas/.vagrant.d/insecure_private_key'], 'identitiesonly': 
'yes', 'loglevel': 'FATAL'}

Example 4-12 shows our complete vagrant.py script.

Example 4-12. vagrant.py
#!/usr/bin/env python3
""" Vagrant inventory script """
# Adapted from Mark Mandel's implementation
# https://github.com/markmandel/vagrant_ansible_example

import argparse
import io
import json
import subprocess
import sys

import paramiko


def parse_args():
    """command-line options"""
    parser = argparse.ArgumentParser(description="Vagrant inventory script")
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument('--list', action='store_true')
    group.add_argument('--host')
    return parser.parse_args()


def list_running_hosts():
    """vagrant.py --list function"""
    cmd = ["vagrant", "status", "--machine-readable"]
    status = subprocess.check_output(cmd).rstrip().decode("utf-8")
    hosts = []
    for line in status.splitlines():
        (_, host, key, value) = line.split(',')[:4]
        if key == 'state' and value == 'running':
            hosts.append(host)
    return hosts


def get_host_details(host):
    """vagrant.py --host <hostname> function"""
    cmd = ["vagrant", "ssh-config", host]
    ssh_config = subprocess.check_output(cmd).decode("utf-8")
    config = paramiko.SSHConfig()
    config.parse(io.StringIO(ssh_config))
    host_config = config.lookup(host)
    return {'ansible_host': host_config['hostname'],
            'ansible_port': host_config['port'],
            'ansible_user': host_config['user'],
            'ansible_private_key_file': host_config['identityfile'][0]}


def main():
    """main"""
    args = parse_args()
    if args.list:
        hosts = list_running_hosts()
        json.dump({'vagrant': hosts}, sys.stdout)
    else:
        details = get_host_details(args.host)
        json.dump(details, sys.stdout)


if __name__ == '__main__':
    main()

Breaking the Inventory into Multiple Files

If you want to have both a regular inventory file and a dynamic inventory script (or, really, any combination of static and dynamic inventory files), just put them all in the same directory and configure Ansible to use that directory as the inventory. You can do this via the inventory parameter in ansible.cfg or by using the -i flag on the command line. Ansible will process all of the files and merge the results into a single inventory.

This means that you can create one inventory directory to use with Ansible on the command line with hosts running in Vagrant, Amazon EC2, Google Cloud Platform, Microsoft Azure, or wherever you need them!

For example, Bas’s directory structure looks like this:

  • inventory/aws_ec2.yml
  • inventory/azure_rm.yml
  • inventory/group_vars/vagrant
  • inventory/group_vars/staging
  • inventory/group_vars/production
  • inventory/hosts
  • inventory/vagrant.py

Adding Entries at Runtime with add_host and group_by

Ansible will let you add hosts and groups to the inventory during the execution of a playbook. This is useful when managing dynamic clusters, such as Redis Sentinel.

add_host

The add_host module adds a host to the inventory; this is useful if you’re using Ansible to provision new virtual machine instances inside an infrastructure-as-a-service cloud.

Invoking the module looks like this:

- name: Add the host
  add_host
    name: hostname
    groups: web,staging
    myvar: myval

Specifying the list of groups and additional variables is optional.

Here’s the add_host command in action, bringing up a new Vagrant machine and then configuring the machine:

---
- name: Provision a Vagrant machine
  hosts: localhost
  vars:
    box: centos/stream8

  tasks:
    - name: Create a Vagrantfile
      command: "vagrant init {{ box }}"
      args:
        creates: Vagrantfile

    - name: Bring up the vagrant machine
      command: vagrant up
      args:
        creates: .vagrant/machines/default/virtualbox/box_meta

    - name: Add the vagrant machine to the inventory
      add_host:
        name: default
        ansible_host: 127.0.0.1
        ansible_port: 2222
        ansible_user: vagrant
        ansible_private_key_file: >
          .vagrant/machines/default/virtualbox/private_key

- name: Do something to the vagrant machine
  hosts: default
  tasks:
    # The list of tasks would go here
    - name: ping
      ping:
...
Note

The add_host module adds the host only for the duration of the execution of the playbook. It does not modify your inventory file.

When we provision inside our playbooks, we like to split it into two plays. The first play runs against localhost and provisions the hosts, and the second play configures the hosts.

Note that we use the creates: Vagrantfile argument in this task:

- name: Create a Vagrantfile
  command: "vagrant init {{ box }}"
  args:
    creates: Vagrantfile

This tells Ansible that if the Vagrantfile file is present, there is no need to run the command again. Ensuring that the (potentially nonidempotent) command is run only once is a way of achieving idempotence in a playbook that invokes the command module. The same is done with the vagrant up command module.

group_by

Ansible’s group_by module allows you to create new groups while a playbook is executing. Any group you create will be based on the value of a variable that has been set on each host, which Ansible refers to as a fact. (Chapter 5 covers facts in more detail.)

If Ansible fact gathering is enabled, Ansible will associate a set of variables with a host. For example, the ansible_machine variable will be i386 for 32-bit x86 machines and x86_64 for 64-bit x86 machines. If Ansible is interacting with a mix of such hosts, we can create i386 and x86_64 groups with the task.

If we’d rather group our hosts by Linux distribution (for example, Ubuntu or CentOS), we can use the ansible_fact.distribution fact:

- name: Create groups based on Linux distribution
  group_by:
    key: "{{ ansible_facts.distribution }}"

In Example 4-13, we use group_by to create separate groups for our Ubuntu and CentOS hosts, then we use the apt module to install packages onto Ubuntu and the yum module to install packages into CentOS.

Example 4-13. Creating ad hoc groups based on Linux distribution
---

- name: Group hosts by distribution
  hosts: all
  gather_facts: true
  tasks:
    - name: Create groups based on distro
      group_by:
        key: "{{ ansible_facts.distribution }}"

- name: Do something to Ubuntu hosts
  hosts: Ubuntu
  become: true
  tasks:
    - name: Install jdk and jre
      apt:
        update_cache: true
        name:
          - openjdk-11-jdk-headless
          - openjdk-11-jre-headless

- name: Do something else to CentOS hosts
  hosts: CentOS
  become: true
  tasks:
    - name: Install jdk
      yum:
        name:
          - java-11-openjdk-headless
          - java-11-openjdk-devel

Conclusion

That about does it for Ansible’s inventory. It is a very flexible object that helps describe your infrastructure and the way you want to use it. The inventory can be as simple as one text file or as complex as you can handle.

The next chapter covers how to use variables.

1 This term has been popularized by Randy Bias of Cloudscaling.

Get Ansible: Up and Running, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.