Chapter 4. Inventory: Describing Your Servers
So far, we’ve been working with only one server (or host, as Ansible calls it). The simplest inventory is a comma-separated list of hostnames, which you can do even without a server:
$ ansible all -i 'localhost,' -a date
In reality, you’re going to be managing multiple hosts. The collection of hosts that Ansible knows about is called the inventory. In this chapter, you will learn how to describe a set of hosts as an Ansible inventory by creating an inventory that contains multiple machines.
Your ansible.cfg file should look like Example 4-1, which enables all inventory plug-ins explicitly.
Example 4-1. ansible.cfg
[defaults] inventory = inventory [inventory] enable_plugins = host_list, script, auto, yaml, ini, toml
In this chapter, we will use a directory named inventory for the inventory examples. The Ansible inventory is a very flexible object: it can be a file (in several formats), a directory, or an executable, and some executables are bundled as plug-ins. Inventory plug-ins allow us to point at data sources, like your cloud provider, to compile the inventory. An inventory can be stored separately from your playbooks. This means that you can create one inventory directory to use with Ansible on the command line, with hosts running in Vagrant, Amazon EC2, Google Cloud Platform, or Microsoft Azure, or wherever you like!
Note
Serge van Ginderachter is the most knowledgeable person to read on Ansible inventory. See his blog for in-depth details.
Inventory/Hosts Files
The default way to describe your hosts in Ansible is to list them in text files, called inventory hosts files. The simplest form is just a list of hostnames in a file named hosts, as shown in Example 4-2.
Example 4-2. A very simple inventory file
frankfurt.example.com helsinki.example.com hongkong.example.com johannesburg.example.com london.example.com newyork.example.com seoul.example.com sydney.example.com
Ansible automatically adds one host to the inventory by default: localhost. It understands that localhost
refers to your local machine, with which it will interact directly rather than connecting by SSH.
Preliminaries: Multiple Vagrant Machines
To talk about inventory, you’ll need to interact with multiple hosts. Let’s configure Vagrant to bring up three hosts. We’ll unimaginatively call them vagrant1
, vagrant2
, and vagrant3
.
Before you create a new Vagrantfile for this chapter, make sure you destroy your existing virtual machine(s) by running the following:
$ vagrant destroy --force
If you don’t include the --force
option, Vagrant will prompt you to confirm that you want to destroy each virtual machine listed in the Vagrantfile.
Next, create a new Vagrantfile that looks like Example 4-3.
Example 4-3. Vagrantfile with three servers
VAGRANTFILE_API_VERSION = "2" Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| # Use the same key for each machine config.ssh.insert_key = false config.vm.define "vagrant1" do |vagrant1| vagrant1.vm.box = "ubuntu/focal64" vagrant1.vm.network "forwarded_port", guest: 80, host: 8080 vagrant1.vm.network "forwarded_port", guest: 443, host: 8443 end config.vm.define "vagrant2" do |vagrant2| vagrant2.vm.box = "ubuntu/focal64" vagrant2.vm.network "forwarded_port", guest: 80, host: 8081 vagrant2.vm.network "forwarded_port", guest: 443, host: 8444 end config.vm.define "vagrant3" do |vagrant3| vagrant3.vm.box = "centos/stream8" vagrant3.vm.network "forwarded_port", guest: 80, host: 8082 vagrant3.vm.network "forwarded_port", guest: 443, host: 8445 end end
Vagrant, from version 1.7 on, defaults to using a different SSH key for each host. Example 4-3 contains the line to revert to the earlier behavior of using the same SSH key for each host:
config.ssh.insert_key = false
Using the same key on each host simplifies our Ansible setup because we can specify a single SSH key in the configuration.
For now, let’s assume that each of these servers can potentially be a web server, so Example 4-3 maps ports 80 and 443 inside each Vagrant machine to a port on the local machine.
We should be able to bring up the virtual machines by running the following:
$ vagrant up
If all goes well, the output should look something like this:
Bringing machine 'vagrant1' up with 'virtualbox' provider... Bringing machine 'vagrant2' up with 'virtualbox' provider... Bringing machine 'vagrant3' up with 'virtualbox' provider... ... vagrant1: 80 (guest) => 8080 (host) (adapter 1) vagrant1: 443 (guest) => 8443 (host) (adapter 1) vagrant1: 22 (guest) => 2222 (host) (adapter 1) ==> vagrant1: Running 'pre-boot' VM customizations... ==> vagrant1: Booting VM... ==> vagrant1: Waiting for machine to boot. This may take a few minutes... vagrant1: SSH address: 127.0.0.1:2222 vagrant1: SSH username: vagrant vagrant1: SSH auth method: private key ==> vagrant1: Machine booted and ready! ==> vagrant1: Checking for guest additions in VM... ==> vagrant1: Mounting shared folders... vagrant1: /vagrant => /Users/bas/code/ansible/ansiblebook/ansiblebook/ch03
Next, we need to know what ports on the local machine map to the SSH port (22) inside each VM. Recall that we can get that information by running the following:
$ vagrant ssh-config
The output should look something like this:
Host vagrant1 HostName 127.0.0.1 User vagrant Port 2222 UserKnownHostsFile /dev/null StrictHostKeyChecking no PasswordAuthentication no IdentityFile /Users/lorin/.vagrant.d/insecure_private_key IdentitiesOnly yes LogLevel FATAL Host vagrant2 HostName 127.0.0.1 User vagrant Port 2200 UserKnownHostsFile /dev/null StrictHostKeyChecking no PasswordAuthentication no IdentityFile /Users/lorin/.vagrant.d/insecure_private_key IdentitiesOnly yes LogLevel FATAL Host vagrant3 HostName 127.0.0.1 User vagrant Port 2201 UserKnownHostsFile /dev/null StrictHostKeyChecking no PasswordAuthentication no IdentityFile /Users/lorin/.vagrant.d/insecure_private_key IdentitiesOnly yes LogLevel FATAL
A lot of the ssh-config
information is repetitive and can be reduced. The information that differs per host is that vagrant1
uses port 2222, vagrant2
uses port 2200, and vagrant3
uses port 2201.
Ansible uses your local SSH client by default, which means that it will understand any aliases that you set up in your SSH config file. Therefore, we use a wildcard alias in the file ~/.ssh/config:
Host vagrant* Hostname 127.0.0.1 User vagrant UserKnownHostsFile /dev/null StrictHostKeyChecking no PasswordAuthentication no IdentityFile ~/.vagrant.d/insecure_private_key IdentitiesOnly yes LogLevel FATAL
Modify your inventory/hosts file so it looks like this:
vagrant1 ansible_port=2222 vagrant2 ansible_port=2200 vagrant3 ansible_port=2201
Now, make sure that you can access these machines. For example, to get information about the network interface for vagrant2
, run the following:
$ ansible vagrant2 -a "ip addr show dev enp0s3"
Your output should look something like this:
vagrant2 | CHANGED | rc=0 >> 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 02:1e:de:45:2c:c8 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3 valid_lft 86178sec preferred_lft 86178sec inet6 fe80::1e:deff:fe45:2cc8/64 scope link valid_lft forever preferred_lft forever
Behavioral Inventory Parameters
To describe our Vagrant machines in the Ansible inventory file, we had to explicitly specify the port (2222, 2200, or 2201) to which Ansible’s SSH client should connect. Ansible calls such variables behavioral inventory parameters, and there are several of them you can use when you need to override the Ansible defaults for a host (see Table 4-1).
Name | Default | Description |
---|---|---|
ansible_host |
Name of host | Hostname or IP address to SSH to |
ansible_port |
22 | Port to SSH to |
ansible_user |
$USER | User to SSH as |
ansible_password |
(None) | Password to use for SSH authentication |
ansible_connection |
smart | How Ansible will connect to host (see the following section) |
ansible_ssh_private_key_file |
(None) | SSH private key to use for SSH authentication |
ansible_shell_type |
sh | Shell to use for commands (see the following section) |
ansible_python_interpreter |
/usr/bin/python | Python interpreter on host (see the following section) |
ansible_*_interpreter |
(None) | Like ansible_python_interpreter for other languages (see the following section) |
For some of these options, the meaning is obvious from the name, but others require more explanation:
ansible_connection
-
Ansible supports multiple transports, which are mechanisms that Ansible uses to connect to the host. The default transport,
smart
, will check whether the locally installed SSH client supports a feature calledControlPersist
. If the SSH client supportsControlPersist
, Ansible will use the local SSH client. If not, the smart transport will fall back to using a Python-based SSH client library called Paramiko. ansible_shell_type
-
Ansible works by making SSH connections to remote machines and then invoking scripts. By default, Ansible assumes that the remote shell is the Bourne shell located at /bin/sh, and will generate the appropriate command-line parameters that work with that. It creates temporary directories to store these scripts.
Ansible also accepts
csh
,fish
, and (on Windows)powershell
as valid values for this parameter. Ansible doesn’t work with restricted shells. ansible_python_interpreter
-
Ansible needs to know the location of the Python interpreter on the remote machine. You might want to change this to choose a version that works for you. The easiest way to run Ansible under Python 3 is to install it with pip3 and set this:
ansible_python_interpreter="/usr/bin/env python3"
ansible_*_interpreter
-
If you are using a custom module that is not written in Python, you can use this parameter to specify the location of the interpreter (such as /usr/bin/ruby). We’ll cover this in Chapter 12.
Changing Behavioral Parameter Defaults
You can override some of the behavioral parameter default values in the inventory file, or you can override them in the defaults
section of the ansible.cfg file (Table 4-2). Consider where you change these parameters. Are the changes a personal choice, or does the change apply to your whole team? Does a part of your inventory need a different setting? Remember that you can configure SSH preferences in the ~/.ssh/config file.
Behavioral inventory parameter | ansible.cfg option |
---|---|
ansible_port |
remote_port |
ansible_user |
remote_user |
ansible_ssh_private_key_file |
ssh_private_key_file |
ansible_shell_type |
executable (see the following paragraph) |
The ansible.cfg executable
config option is not exactly the same as the ansible_shell_type
behavioral inventory parameter. The executable specifies the full path of the shell to use on the remote machine (for example, /usr/local/bin/fish). Ansible will look at the base name of this path (in this case fish) and use that as the default value for ansible_shell_type
.
Groups and Groups and Groups
We typically want to perform configuration actions on groups of hosts, rather than on an individual host. Ansible automatically defines a group called all
(or *
), which includes all the hosts in the inventory. For example, we can check whether the clocks on the machines are roughly synchronized by running the following:
$ ansible all -a "date"
or
$ ansible '*' -a "date"
The output on Bas’s system looks like this:
vagrant2 | CHANGED | rc=0 >> Wed 12 May 2021 01:37:47 PM UTC vagrant1 | CHANGED | rc=0 >> Wed 12 May 2021 01:37:47 PM UTC vagrant3 | CHANGED | rc=0 >> Wed 12 May 2021 01:37:47 PM UTC
We can define our own groups in the inventory hosts file. Ansible uses the .ini file format for inventory hosts files; it groups configuration values into sections.
Here’s how to specify that our vagrant hosts are in a group called vagrant
, along with the other example hosts mentioned at the beginning of the chapter:
frankfurt.example.com helsinki.example.com hongkong.example.com johannesburg.example.com london.example.com newyork.example.com seoul.example.com sydney.example.com [vagrant] vagrant1 ansible_port=2222 vagrant2 ansible_port=2200 vagrant3 ansible_port=2201
We could alternately list the Vagrant hosts at the top and then also in a group, like this:
frankfurt.example.com helsinki.example.com hongkong.example.com johannesburg.example.com london.example.com newyork.example.com seoul.example.com sydney.example.com vagrant1 ansible_port=2222 vagrant2 ansible_port=2200 vagrant3 ansible_port=2201 [vagrant] vagrant1 vagrant2 vagrant3
You can use groups in any way that suits you: they can overlap or be nested, however you like. The order does not matter, except for human readability.
Example: Deploying a Django App
Imagine you’re responsible for deploying a Django-based web application that processes long-running jobs. The app needs to support the following services:
-
The actual Django web app itself, run by a Gunicorn HTTP server
-
A NGINX web server, which will sit in front of Gunicorn and serve static assets
-
A Celery task queue that will execute long-running jobs on behalf of the web app
-
A RabbitMQ message queue that serves as the backend for Celery
-
A Postgres database that serves as the persistent store
In later chapters, we will work through a detailed example of deploying this kind of Django-based application, although our example won’t use Celery or RabbitMQ. For now, we need to deploy this application into three different environments: production (the real thing), staging (for testing on hosts that our team has shared access to), and Vagrant (for local testing).
When we deploy to production, we want the entire system to respond quickly and reliably, so we do the following:
-
Run the web application on multiple hosts for better performance and put a load balancer in front of them
-
Run task queue servers on multiple hosts for better performance
-
Put Gunicorn, Celery, RabbitMQ, and Postgres all on separate servers
-
Use two Postgres hosts: a primary and a replica
Assuming we have one load balancer, three web servers, three task queues, one RabbitMQ server, and two database servers, that’s 10 hosts we need to deal with (Figure 4-1).
For our staging environment, we want to use fewer hosts than we do in production to save costs, since it’s going to see a lot less activity than production will. Let’s say we decide to use only two hosts for staging; we’ll put the web server and task queue on one staging host, and RabbitMQ and Postgres on the other.
For our local Vagrant environment, we decide to use three servers: one for the web app, one for a task queue, and one that will contain RabbitMQ and Postgres.
Example 4-4 shows a sample inventory file that groups servers by environment (production, staging, Vagrant) and by function (web server, task queue, etc.).
Example 4-4. Inventory file for deploying a Django app
[production] frankfurt.example.com helsinki.example.com hongkong.example.com johannesburg.example.com london.example.com newyork.example.com seoul.example.com sydney.example.com tokyo.example.com toronto.example.com [staging] amsterdam.example.com chicago.example.com [lb] helsinki.example.com [web] amsterdam.example.com seoul.example.com sydney.example.com toronto.example.com vagrant1 [task] amsterdam.example.com hongkong.example.com johannesburg.example.com newyork.example.com vagrant2 [rabbitmq] chicago.example.com tokyo.example.com vagrant3 [db] chicago.example.com frankfurt.example.com london.example.com vagrant3
We could have first listed all of the servers at the top of the inventory file, without specifying a group, but that isn’t necessary, and that would’ve made this file even longer.
Note that we need to specify the behavioral inventory parameters for the Vagrant instances only once.
Aliases and Ports
We have described our Vagrant hosts like this:
[vagrant] vagrant1 ansible_port=2222 vagrant2 ansible_port=2200 vagrant3 ansible_port=2201
The names vagrant1
, vagrant2
, and vagrant3
here are aliases. They are not the real hostnames, just useful names for referring to these hosts. Ansible resolves hostnames using the inventory, your SSH config file, /etc/hosts, and DNS. This flexibility is useful in development but can be a cause of confusion.
Ansible also supports using <hostname>:<port>
syntax when specifying hosts, so we could replace the line that contains vagrant1
with 127.0.0.1:2222
(Example 4-5).
Example 4-5. This doesn’t work
[vagrant] 127.0.0.1:2222 127.0.0.1:2200 127.0.0.1:2201
However, we can’t actually run what you see in Example 4-5. The reason is that Ansible’s inventory can associate only a single host with 127.0.0.1, so the Vagrant group would contain only one host instead of three.
Groups of Groups
Ansible also allows you to define groups that are made up of other groups. For example, since both the web servers and the task queue servers will need Django and its dependencies, it might be useful to define a django
group that contains both. You would add this to the inventory file:
[django:children] web task
Note that the syntax changes when you are specifying a group of groups, as opposed to a group of hosts. That’s so Ansible knows to interpret web
and task
as groups and not as hosts.
Numbered Hosts (Pets Versus Cattle)
The inventory file you saw back in Example 4-4 looks complex. It describes 15 hosts, which doesn’t sound like a large number in this cloudy, scale-out world. However, dealing with 15 hosts in the inventory file can be cumbersome, because each host has a completely different hostname.
Bill Baker of Microsoft came up with the distinction between treating servers as pets versus treating them like cattle.1 We give pets distinctive names and treat and care for them as individuals; with cattle, though, we refer to them by identification number and treat them as livestock.
The “cattle” approach to servers is much more scalable, and Ansible supports it well by supporting numeric patterns. For example, if your 20 servers are named web1.example.com, web2.example.com, and so on, then you can specify them in the inventory file like this:
[web] web[1:20].example.com
If you prefer to have a leading zero (such as web01.example.com), specify that in the range, like this:
[web] web[01:20].example.com
Ansible also supports using alphabetic characters to specify ranges. If you want to use the convention web-a.example.com, web-b.example.com, and so on, for your 20 servers, then you can do this:
[web] web-[a:t].example.com
Hosts and Group Variables: Inside the Inventory
Recall how we can specify behavioral inventory parameters for Vagrant hosts:
vagrant1 ansible_host=127.0.0.1 ansible_port=2222 vagrant2 ansible_host=127.0.0.1 ansible_port=2200 vagrant3 ansible_host=127.0.0.1 ansible_port=2201
Those parameters are variables that have special meaning to Ansible. We can also define arbitrary variable names and associated values on hosts. For example, we could define a variable named color
and set it to a value for each server:
amsterdam.example.com color=red seoul.example.com color=green sydney.example.com color=blue toronto.example.com color=purple
We could then use this variable in a playbook, just like any other variable. Personally, your authors don’t often attach variables to specific hosts. On the other hand, we often associate variables with groups.
Circling back to our Django example, the web application and task queue service need to communicate with RabbitMQ and Postgres. We’ll assume that access to the Postgres database is secured both at the network layer (so only the web application and the task queue can reach the database) and by username and password. RabbitMQ is secured only by the network layer.
To set everything up, you can:
-
Configure the web servers with the hostname, port, username, password of the primary Postgres server, and name of the database.
-
Configure the task queues with the hostname, port, username, password of the primary Postgres server, and the name of the database.
-
Configure the web servers with the hostname and port of the RabbitMQ server.
-
Configure the task queues with the hostname and port of the RabbitMQ server.
-
Configure the primary Postgres server with the hostname, port, and username and password of the replica Postgres server (production only).
This configuration info varies by environment, so it makes sense to define these as group variables on the production, staging, and Vagrant groups. Example 4-6 shows one way to do so in the inventory file. (A better way to store passwords is discussed in Chapter 8).
Example 4-6. Specifying group variables in inventory
[all:vars] ntp_server=ntp.ubuntu.com [production:vars] db_primary_host=frankfurt.example.com db_primary_port=5432 db_replica_host=london.example.com db_name=widget_production db_user=widgetuser db_password=pFmMxcyD;Fc6)6 rabbitmq_host=johannesburg.example.com rabbitmq_port=5672 [staging:vars] db_primary_host=chicago.example.com db_primary_port=5432 db_name=widget_staging db_user=widgetuser db_password=L@4Ryz8cRUXedj rabbitmq_host=chicago.example.com rabbitmq_port=5672 [vagrant:vars] db_primary_host=vagrant3 db_primary_port=5432 db_name=widget_vagrant db_user=widgetuser db_password=password rabbitmq_host=vagrant3 rabbitmq_port=5672
Note how the group variables are organized into sections named [<group name>:vars]
. Also, we’ve taken advantage of the all
group (which, you’ll recall, Ansible creates automatically) to specify variables that don’t change across hosts.
Host and Group Variables: In Their Own Files
The inventory file is a reasonable place to put host and group variables if you don’t have too many hosts. But as your inventory gets larger, it gets more difficult to manage variables this way. Additionally, even though Ansible variables can hold Booleans, strings, lists, and dictionaries, in an inventory file you can specify only Booleans and strings.
Ansible offers a more scalable approach to keep track of host and group variables: you can create a separate variable file for each host and each group. Ansible expects these variable files to be in YAML format.
It looks for host variable files in a directory called host_vars and group variable files in a directory called group_vars. Ansible expects these directories to be in either the directory that contains your playbooks or the directory adjacent to your inventory file. When you have both directories, then the first (the playbook directory) has priority.
For example, if Lorin has a directory containing his playbooks at /home/lorin/playbooks/ with an inventory directory and hosts file at /home/lorin/inventory/hosts, he should put variables for the amsterdam.example.com host in the file /home/lorin/inventory/host_vars/amsterdam.example.com and variables for the production group in the file /home/lorin/inventory/group_vars/production (shown in Example 4-7).
Example 4-7. group_vars/production
--- db_primary_host: frankfurt.example.com db_primary_port: 5432 db_replica_host: london.example.com db_name: widget_production db_user: widgetuser db_password: 'pFmMxcyD;Fc6)6' rabbitmq_host: johannesburg.example.com rabbitmq_port: 5672 ...
We can also use YAML dictionaries to represent these values, as shown in Example 4-8.
Example 4-8. group_vars/production, with dictionaries
--- db: user: widgetuser password: 'pFmMxcyD;Fc6)6' name: widget_production primary: host: frankfurt.example.com port: 5432 replica: host: london.example.com port: 5432 rabbitmq: host: johannesburg.example.com port: 5672 ...
If we choose YAML dictionaries, we access the variables with dot notation like this:
"{{ db.primary.host }}"
We can also access the variables in the dictionary like this:
"{{ db['primary']['host'] }}"
Contrast that to how we would otherwise access them:
"{{ db_primary_host }}"
If we want to break things out even further, Ansible lets us define group_vars/production as a directory instead of a file. We can place multiple YAML files into it that contain variable definitions. For example, we could put database-related variables in one file and the RabbitMQ-related variables in another file, as shown in Examples 4-9 and 4-10.
Example 4-9. group_vars/production/db
--- db: user: widgetuser password: 'pFmMxcyD;Fc6)6' name: widget_production primary: host: frankfurt.example.com port: 5432 replica: host: london.example.com port: 5432 ...
Example 4-10. group_vars/production/rabbitmq
--- rabbitmq: host: johannesburg.example.com port: 6379 ...
It’s often better to start simple, rather than splitting variables out across too many files. In larger teams and projects, the value of separate files increases, since many people might need to pull and work in files at the same time.
Dynamic Inventory
Up until this point, we’ve been explicitly specifying all our hosts in our hosts inventory file. However, you might have a system external to Ansible that keeps track of your hosts. For example, if your hosts run on Amazon EC2, then EC2 tracks information about your hosts for you. You can retrieve this information through EC2’s web interface, its Query API, or command-line tools such as awscli
. Other cloud providers have similar interfaces.
If you’re managing your own servers using an automated provisioning system such as Cobbler or Ubuntu Metal as a Service (MAAS), then your system is already keeping track of your servers. Or, maybe you have one of those fancy configuration management databases (CMDBs) where all of this information lives.
You don’t want to manually duplicate this information in your hosts file, because eventually that file will not jibe with your external system, which is the true source of information about your hosts. Ansible supports a feature called dynamic inventory that allows you to avoid this duplication.
If the inventory file is marked executable, Ansible will assume it is a dynamic inventory script and will execute the file instead of reading it.
Inventory Plug-ins
Ansible comes with several executables that can connect to various cloud systems, provided you install the requirements and set up authentication. These plug-ins typically need a YAML configuration file in the inventory directory, as well as some environment variables or authentication files.
To see the list of available plug-ins:
$ ansible-doc -t inventory -l
To see plug-in-specific documentation and examples:
$ ansible-doc -t inventory <plugin name>
The Interface for a Dynamic Inventory Script
An Ansible dynamic inventory script must support two command-line flags:
-
--host=<hostname>
for showing host details -
--list
for listing groups
Also it should return output in JSON format with a specific structure that Ansible can interpret.
Showing host details
To get the details of the individual host, Ansible will call an inventory script with the --host=
argument:
$ ansible-inventory -i inventory/hosts --host=vagrant2
Note
Ansible includes a script that functions as a dynamic inventory script for the static inventory provided with the -i
command-line argument: ansible-inventory
.
The output should contain any host-specific variables, including behavioral parameters, like this:
{ "ansible_host": "127.0.0.1", "ansible_port": 2200, "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key", "ansible_user": "vagrant" }
The output is a single JSON object; the names are variable names, and the values are the variable values.
Listing groups
Dynamic inventory scripts need to be able to list all of the groups and details about the individual hosts. In the GitHub repository that accompanies this book, there is an inventory script for the Vagrant hosts called vagrant.py. Ansible will call it like this to get a list of all of the groups:
$ ./vagrant.py --list
In the simplest form the output could look like this:
{"vagrant": ["vagrant1", "vagrant2", "vagrant3"]}
This output is a single JSON object; the names are Ansible group names, and the values are arrays of hostnames.
As an optimization, the --list
command can contain the values of the host variables for all of the hosts, which saves Ansible the trouble of making a separate --host
invocation to retrieve the variables for the individual hosts.
To take advantage of this optimization, the --list
command should return a key named _meta
that contains the variables for each host, in this form:
"_meta": { "hostvars": { "vagrant1": { "ansible_user": "vagrant", "ansible_host": "127.0.0.1", "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key", "ansible_port": "2222" }, "vagrant2": { "ansible_user": "vagrant", "ansible_host": "127.0.0.1", "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key", "ansible_port": "2200" }, "vagrant3": { "ansible_user": "vagrant", "ansible_host": "127.0.0.1", "ansible_ssh_private_key_file": "~/.vagrant.d/insecure_private_key", "ansible_port": "2201" } }
Writing a Dynamic Inventory Script
One of the handy features of Vagrant is that you can see which machines are currently running by using the vagrant status
command. Assuming we have a Vagrant file that looks like Example 4-3, if we run vagrant status
, the output would look like Example 4-11.
Example 4-11. Output of vagrant status
$ vagrant status Current machine states: vagrant1 running (virtualbox) vagrant2 running (virtualbox) vagrant3 running (virtualbox) This environment represents multiple VMs. The VMs are all listed above with their current state. For more information about a specific VM, run 'vagrant status NAME'.
Because Vagrant already keeps track of machines for us, there’s no need for us to list them in an Ansible inventory file. Instead, we can write a dynamic inventory script that queries Vagrant about which machines are running. Once we’ve set up a dynamic inventory script for Vagrant, even if we alter our Vagrantfile to run different numbers of Vagrant machines, we won’t need to edit an Ansible inventory file.
Let’s work through an example of creating a dynamic inventory script that retrieves the details about hosts from Vagrant. Our dynamic inventory script is going to need to invoke the vagrant status
command. The output shown in Example 4-11 is designed for humans to read. We can get a list of running hosts in a format that is easier for computers to parse with the --machine-readable
flag, like so:
$ vagrant status --machine-readable
The output looks like this:
1620831617,vagrant1,metadata,provider,virtualbox 1620831617,vagrant2,metadata,provider,virtualbox 1620831618,vagrant3,metadata,provider,virtualbox 1620831619,vagrant1,provider-name,virtualbox 1620831619,vagrant1,state,running 1620831619,vagrant1,state-human-short,running 1620831619,vagrant1,state-human-long,The VM is running. To stop this VM%!(VAGRANT_COMMA) you can run `vagrant halt` to\nshut it down forcefully%!(VAGRANT_COMMA) or you can run `vagrant suspend` to simply\nsuspend the virtual machine. In either case%!(VAGRANT_COMMA) to restart it again%!(VAGRANT_COMMA)\nsimply run `vagrant up`. 1620831619,vagrant2,provider-name,virtualbox 1620831619,vagrant2,state,running 1620831619,vagrant2,state-human-short,running 1620831619,vagrant2,state-human-long,The VM is running. To stop this VM%!(VAGRANT_COMMA) you can run `vagrant halt` to\nshut it down forcefully%!(VAGRANT_COMMA) or you can run `vagrant suspend` to simply\nsuspend the virtual machine. In either case%!(VAGRANT_COMMA) to restart it again%!(VAGRANT_COMMA)\nsimply run `vagrant up`. 1620831620,vagrant3,provider-name,virtualbox 1620831620,vagrant3,state,running 1620831620,vagrant3,state-human-short,running 1620831620,vagrant3,state-human-long,The VM is running. To stop this VM%!(VAGRANT_COMMA) you can run `vagrant halt` to\nshut it down forcefully%!(VAGRANT_COMMA) or you can run `vagrant suspend` to simply\nsuspend the virtual machine. In either case%!(VAGRANT_COMMA) to restart it again%!(VAGRANT_COMMA)\nsimply run `vagrant up`. 1620831620,,ui,info,Current machine states:\n\nvagrant1 running (virtualbox)\nvagrant2 running (virtualbox)\nvagrant3 running (virtualbox)\n\nThis environment represents multiple VMs. The VMs are all listed\nabove with their current state. For more information about a specific\nVM%!(VAGRANT_COMMA) run `vagrant status NAME`
To get details about a particular Vagrant machine, say, vagrant2
, we would run this:
$ vagrant ssh-config vagrant2
The output looks like this:
Host vagrant2 HostName 127.0.0.1 User vagrant Port 2200 UserKnownHostsFile /dev/null StrictHostKeyChecking no PasswordAuthentication no IdentityFile /Users/lorin/.vagrant.d/insecure_private_key IdentitiesOnly yes LogLevel FATAL
Our dynamic inventory script will need to call these commands, parse the outputs, and output the appropriate JSON. We can use the Paramiko library to parse the output of vagrant ssh-config
. First, install the Python Paramiko library with pip:
$ pip3 install --user paramiko
Here’s an interactive Python session that shows how to use the Paramiko library to do this:
$ python3 >>> import io >>> import subprocess >>> import paramiko >>> cmd = ["vagrant", "ssh-config", "vagrant2"] >>> ssh_config = subprocess.check_output(cmd).decode("utf-8") >>> config = paramiko.SSHConfig() >>> config.parse(io.StringIO(ssh_config)) >>> host_config = config.lookup("vagrant2") >>> print (host_config) {'hostname': '127.0.0.1', 'user': 'vagrant', 'port': '2200', 'userknownhostsfile': '/dev/null', 'stricthostkeychecking': 'no', 'passwordauthentication': 'no', 'identityfile': ['/Users/bas/.vagrant.d/insecure_private_key'], 'identitiesonly': 'yes', 'loglevel': 'FATAL'}
Example 4-12 shows our complete vagrant.py script.
Example 4-12. vagrant.py
#!/usr/bin/env python3 """ Vagrant inventory script """ # Adapted from Mark Mandel's implementation # https://github.com/markmandel/vagrant_ansible_example import argparse import io import json import subprocess import sys import paramiko def parse_args(): """command-line options""" parser = argparse.ArgumentParser(description="Vagrant inventory script") group = parser.add_mutually_exclusive_group(required=True) group.add_argument('--list', action='store_true') group.add_argument('--host') return parser.parse_args() def list_running_hosts(): """vagrant.py --list function""" cmd = ["vagrant", "status", "--machine-readable"] status = subprocess.check_output(cmd).rstrip().decode("utf-8") hosts = [] for line in status.splitlines(): (_, host, key, value) = line.split(',')[:4] if key == 'state' and value == 'running': hosts.append(host) return hosts def get_host_details(host): """vagrant.py --host <hostname> function""" cmd = ["vagrant", "ssh-config", host] ssh_config = subprocess.check_output(cmd).decode("utf-8") config = paramiko.SSHConfig() config.parse(io.StringIO(ssh_config)) host_config = config.lookup(host) return {'ansible_host': host_config['hostname'], 'ansible_port': host_config['port'], 'ansible_user': host_config['user'], 'ansible_private_key_file': host_config['identityfile'][0]} def main(): """main""" args = parse_args() if args.list: hosts = list_running_hosts() json.dump({'vagrant': hosts}, sys.stdout) else: details = get_host_details(args.host) json.dump(details, sys.stdout) if __name__ == '__main__': main()
Breaking the Inventory into Multiple Files
If you want to have both a regular inventory file and a dynamic inventory script (or, really, any combination of static and dynamic inventory files), just put them all in the same directory and configure Ansible to use that directory as the inventory. You can do this via the inventory
parameter in ansible.cfg or by using the -i
flag on the command line. Ansible will process all of the files and merge the results into a single inventory.
This means that you can create one inventory directory to use with Ansible on the command line with hosts running in Vagrant, Amazon EC2, Google Cloud Platform, Microsoft Azure, or wherever you need them!
For example, Bas’s directory structure looks like this:
- inventory/aws_ec2.yml
- inventory/azure_rm.yml
- inventory/group_vars/vagrant
- inventory/group_vars/staging
- inventory/group_vars/production
- inventory/hosts
- inventory/vagrant.py
Adding Entries at Runtime with add_host and group_by
Ansible will let you add hosts and groups to the inventory during the execution of a playbook. This is useful when managing dynamic clusters, such as Redis Sentinel.
add_host
The add_host
module adds a host to the inventory; this is useful if you’re using Ansible to provision new virtual machine instances inside an infrastructure-as-a-service cloud.
Invoking the module looks like this:
- name: Add the host add_host name: hostname groups: web,staging myvar: myval
Specifying the list of groups and additional variables is optional.
Here’s the add_host
command in action, bringing up a new Vagrant machine and then configuring the machine:
--- - name: Provision a Vagrant machine hosts: localhost vars: box: centos/stream8 tasks: - name: Create a Vagrantfile command: "vagrant init {{ box }}" args: creates: Vagrantfile - name: Bring up the vagrant machine command: vagrant up args: creates: .vagrant/machines/default/virtualbox/box_meta - name: Add the vagrant machine to the inventory add_host: name: default ansible_host: 127.0.0.1 ansible_port: 2222 ansible_user: vagrant ansible_private_key_file: > .vagrant/machines/default/virtualbox/private_key - name: Do something to the vagrant machine hosts: default tasks: # The list of tasks would go here - name: ping ping: ...
Note
The add_host
module adds the host only for the duration of the execution of the playbook. It does not modify your inventory file.
When we provision inside our playbooks, we like to split it into two plays. The first play runs against localhost
and provisions the hosts, and the second play configures the hosts.
Note that we use the creates: Vagrantfile
argument in this task:
- name: Create a Vagrantfile command: "vagrant init {{ box }}" args: creates: Vagrantfile
This tells Ansible that if the Vagrantfile file is present, there is no need to run the command again. Ensuring that the (potentially nonidempotent) command is run only once is a way of achieving idempotence in a playbook that invokes the command
module. The same is done with the vagrant up
command module.
group_by
Ansible’s group_by
module allows you to create new groups while a playbook is executing. Any group you create will be based on the value of a variable that has been set on each host, which Ansible refers to as a fact. (Chapter 5 covers facts in more detail.)
If Ansible fact gathering is enabled, Ansible will associate a set of variables with a host. For example, the ansible_machine
variable will be i386
for 32-bit x86 machines and x86_64
for 64-bit x86 machines. If Ansible is interacting with a mix of such hosts, we can create i386
and x86_64
groups with the task.
If we’d rather group our hosts by Linux distribution (for example, Ubuntu or CentOS), we can use the ansible_fact.distribution
fact:
- name: Create groups based on Linux distribution group_by: key: "{{ ansible_facts.distribution }}"
In Example 4-13, we use group_by
to create separate groups for our Ubuntu and CentOS hosts, then we use the apt
module to install packages onto Ubuntu and the yum
module to install packages into CentOS.
Example 4-13. Creating ad hoc groups based on Linux distribution
--- - name: Group hosts by distribution hosts: all gather_facts: true tasks: - name: Create groups based on distro group_by: key: "{{ ansible_facts.distribution }}" - name: Do something to Ubuntu hosts hosts: Ubuntu become: true tasks: - name: Install jdk and jre apt: update_cache: true name: - openjdk-11-jdk-headless - openjdk-11-jre-headless - name: Do something else to CentOS hosts hosts: CentOS become: true tasks: - name: Install jdk yum: name: - java-11-openjdk-headless - java-11-openjdk-devel
1 This term has been popularized by Randy Bias of Cloudscaling.
Get Ansible: Up and Running, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.