Chapter 4. Working with Docker Images

Every Docker container is based on an image. Images are the underlying definition of what gets reconstituted into a running container, much like a virtual disk becomes a virtual machine when you start it up. Docker images provide the basis for everything that you will ever deploy and run with Docker. To launch a container, you must either download a public image or create your own. You can think of the image as the filsystem for the container. But under the covers, every Docker image consists of one or more filesystem layers that generally have a direct one-to-one mapping to each individual build step used to create that image.

Because images are layered, they put special demand on the Linux kernel, which must provide the drivers that Docker needs to run the storage backend. For image management, Docker relies heavily on this storage backend, which communicates with the underlying Linux filesystem to build and manage the multiple layers that combine into a single usable image. The primary storage backends that are supported include: AUFS, BTRFS, Device-mapper, and overlay2. Each storage backend provides a fast copy-on-write (CoW) system for image management. We discuss the specifics of various backends in Chapter 11. For now we’ll use the default backend and explore how images work, since they make up the basis for almost everything else that you will do with Docker, including:

  • Building images

  • Uploading (pushing) images to an image registry

  • Downloading (pulling) images from an image registry

  • Creating and running containers from an image

Anatomy of a Dockerfile

To create a custom Docker image with the default tools, you will need to become familiar with the Dockerfile. This file describes all the steps that are required to create an image and would usually be contained within the root directory of the source code repository for your application.

A typical Dockerfile might look something like the one shown here, which creates a container for a Node.js-based application:

FROM node:11.11.0

LABEL "maintainer"="anna@example.com"
LABEL "rating"="Five Stars" "class"="First Class"

USER root

ENV AP /data/app
ENV SCPATH /etc/supervisor/conf.d

RUN apt-get -y update

# The daemons
RUN apt-get -y install supervisor
RUN mkdir -p /var/log/supervisor

# Supervisor Configuration
ADD ./supervisord/conf.d/* $SCPATH/

# Application Code
ADD *.js* $AP/

WORKDIR $AP

RUN npm install

CMD ["supervisord", "-n"]

Dissecting this Dockerfile will provide some initial exposure to a number of the possible instructions for controlling how an image is assembled. Each line in a Dockerfile creates a new image layer that is stored by Docker. This layer contains all of the changes that are a result of that command being issued. This means that when you build new images, Docker will only need to build layers that deviate from previous builds: you can reuse all the layers that haven’t changed.

Although you could build a Node instance from a plain, base Linux image, you can also explore the Docker Hub for official images for Node. The Node.js community maintains a series of Docker images and tags that allows you to quickly determine what versions are available. If you want to lock the image to a specific point release of Node, you could point it at something like node:11.11.0. The base image that follows will provide you with an Ubuntu Linux image running Node 11.11.x.

FROM node:11.11.0

Applying labels to images and containers allows you to add metadata via key/value pairs that can later be used to search for and identify Docker images and containers. You can see the labels applied to any image using the docker inspect command.

LABEL "maintainer"="anna@example.com"
LABEL "rating"="Five Stars" "class"="First Class"
Note

MAINTAINER is a deprecated field in the Dockerfile specification. It was designed to provided contact information for the Dockerfile’s author, but Docker now recommends simply using a label for this purpose.

By default, Docker runs all processes as root within the container, but you can use the USER instruction to change this:

USER root
Caution

Even though containers provide some isolation from the underlying operating system, they still run on the host kernel. Due to potential security risks, production containers should almost always be run under the context of a nonprivileged user.

The ENV instruction allows you to set shell variables that can be used by your running application for configuration and during the build process to simplify the Dockerfile and help keep it DRYer:1

ENV AP /data/app
ENV SCPATH /etc/supervisor/conf.d

In the following code, you’ll use a collection of RUN instructions to start and create the required file structure that you need, and install some required software dependencies.

RUN apt-get -y update

# The daemons
RUN apt-get -y install supervisor
RUN mkdir -p /var/log/supervisor
Warning

While we’re demonstrating it here for simplicity, it is not recommended that you run commands like apt-get -y update or yum -y update in your application’s Dockerfile. This is because it requires crawling the repository index each time you run a build, and means that your build is not guaranteed to be repeatable since package versions might change between builds. Instead, consider basing your application image on another image that already has these updates applied to it and where the versions are in a known state. It will be faster and more repeatable.

The ADD instruction is used to copy files from either the local filesystem or a remote URL into your image. Most often this will include your application code and any required support files. Because ADD actually copies the files into the image, you no longer need access to the local filesystem to access them once the image is built. You’ll also start to use the build variables you defined in the previous section to save you a bit of work and help protect you from typos.

# Supervisor Configuration
ADD ./supervisord/conf.d/* $SCPATH/

# Application Code
ADD *.js* $AP/
Tip

Remember that every instruction creates a new Docker image layer, so it often makes sense to combine a few logically grouped commands onto a single line. It is even possible to use the ADD instruction in combination with the RUN instruction to copy a complex script to your image and then execute that script with only two commands in the Dockerfile.

With the WORKDIR instruction, you change the working directory in the image for the remaining build instructions and the default process that launches with any resulting containers:

WORKDIR $AP

RUN npm install
Caution

The order of commands in a Dockerfile can have a very significant impact on ongoing build times. You should try to order commands so that things that change between every single build are closer to the bottom. This means that adding your code and similar steps should be held off until the end. When you rebuild an image, every single layer after the first introduced change will need to be rebuilt.

And finally you end with the CMD instruction, which defines the command that launches the process that you want to run within the container:

CMD ["supervisord", "-n"]
Note

Though not a hard and fast rule, it is generally considered a best practice to try to run only a single process within a container. The core idea is that a container should provide a single function so that it remains easy to horizontally scale individual functions within your architecture. In the example, you are using supervisord as a process manager to help improve the resiliency of the node application within the container and ensure that it stays running. This can also be useful for troubleshooting your application during development, so that you can restart your service without restarting the whole container.

Building an Image

To build your first image, go ahead and clone a Git repo that contains an example application called docker-node-hello, as shown here:2

$ git clone https://github.com/spkane/docker-node-hello.git \
    --config core.autocrlf=input
Cloning into 'docker-node-hello'...
remote: Counting objects: 41, done.
remote: Total 41 (delta 0), reused 0 (delta 0), pack-reused 41
Unpacking objects: 100% (41/41), done.
$ cd docker-node-hello
Note

Git is frequently installed on Linux and macOS systems, but if you do not already have Git available, you can download a simple installer from git-scm.com.

The --config core.autocrlf=input option we use helps ensure that the line endings are not accidentally altered from the Linux standard that is expected.

This will download a working Dockerfile and related source code files into a directory called docker-node-hello. If you look at the contents while ignoring the Git repo directory, you should see the following:

$ tree -a -I .git
.
├── .dockerignore
├── .gitignore
├── Dockerfile
├── Makefile
├── README.md
├── Vagrantfile
├── index.js
├── package.json
└── supervisord
    └── conf.d
        ├── node.conf
        └── supervisord.conf

Let’s review the most relevant files in the repo.

The Dockerfile should be exactly the same as the one you just reviewed.

The .dockerignore file allows you to define files and directories that you do not want uploaded to the Docker host when you are building the image. In this instance, the .dockerignore file contains the following line:

.git

This instructs docker build to exclude the .git directory, which contains the whole source code repository, including Git configuration data and every single change that you have ever made to your code. The rest of the files reflect the current state of your source code on the checked-out branch. You don’t need the contents of the .git directory to build the Docker image, and since it can grow quite large over time, you don’t want to waste time copying it every time you do a build. package.json defines the Node.js application and lists any dependencies that it relies on. index.js is the main source code for the application.

The supervisord directory contains the configuration files for supervisord that you will use to start and monitor the application.

Note

Using supervisord in this example to monitor the application is overkill, but it is intended to provide a bit of insight into some of the techniques you can use in a container to provide more control over your application and its running state.

As we discussed in Chapter 3, you will need to have your Docker server running and your client properly set up to communicate with it before you can build a Docker image. Assuming that this is all working, you should be able to initiate a new build by running the upcoming command, which will build and tag an image based on the files in the current directory.

Each step identified in the following output maps directly to a line in the Dockerfile, and each step creates a new image layer based on the previous step. The first build that you run will take a few minutes because you have to download the base node image. Subsequent builds should be much faster unless a newer node 11.11.0 base image has been released. Let’s run the build:

$ docker build -t example/docker-node-hello:latest .

Sending build context to Docker daemon  15.87kB
Step 1/14 : FROM node:11.11.0
 ---> 9ff38e3a6d9d
Step 2/14 : LABEL "maintainer"="anna@example.com"
 ---> Running in 1d874dd2d5fa
Removing intermediate container 1d874dd2d5fa
 ---> 6113004a627c
Step 3/14 : LABEL "rating"="Five Stars" "class"="First Class"
 ---> Running in 99b5cf62f37a
Removing intermediate container 99b5cf62f37a
 ---> 9b674b79f9f8
Step 4/14 : USER root
 ---> Running in d8cc28917a0c
Removing intermediate container d8cc28917a0c
 ---> a73840c164af
Step 5/14 : ENV AP /data/app
 ---> Running in 879df989d503
Removing intermediate container 879df989d503
 ---> 022f5be79fb0
Step 6/14 : ENV SCPATH /etc/supervisor/conf.d
 ---> Running in 1386343bbcc5
Removing intermediate container 1386343bbcc5
 ---> 181728649ade
Step 7/14 : RUN apt-get -y update
 ---> Running in 460ddd2c7cd6
Get:1 http://security.debian.org jessie/updates
...
Reading package lists...
Removing intermediate container 460ddd2c7cd6
 ---> 9907f012a8e8
Step 8/14 : RUN apt-get -y install supervisor
 ---> Running in 03e9d7c7bd1e
Reading package lists...
...
Processing triggers for systemd (215-17+deb8u5) ...
Removing intermediate container 03e9d7c7bd1e
 ---> 4517cc222b0b
Step 9/14 : RUN mkdir -p /var/log/supervisor
 ---> Running in b0646ddd804a
Removing intermediate container b0646ddd804a
 ---> c45f74d9b9d7
Step 10/14 : ADD ./supervisord/conf.d/* $SCPATH/
 ---> 653eeae68288
Step 11/14 : ADD *.js* $AP/
 ---> 54ced594abe5
Step 12/14 : WORKDIR $AP
Removing intermediate container c065270d7e4b
 ---> d1b5e1d93364
Step 13/14 : RUN npm install
 ---> Running in 0c2dc15cab8d
npm WARN deprecated connect@2.7.9: connect 2.x series is deprecated
...
bytes@0.2.0, formidable@1.0.13)
Removing intermediate container 0c2dc15cab8d
 ---> 6b51bb6d8872
Step 14/14 : CMD ["supervisord", "-n"]
 ---> Running in a7be8f7416a1
Removing intermediate container a7be8f7416a1
 ---> 3a7881d5536e
Successfully built 3a7881d5536e
Successfully tagged example/docker-node-hello:latest
Tip

To improve the speed of builds, Docker will use a local cache when it thinks it is safe. This can sometimes lead to unexpected issues because it doesn’t always notice that something changed in a lower layer. In the preceding output you will notice lines like ---> Running in 0c2dc15cab8d. If instead you see ---> Using cache, you know that Docker decided to use the cache. You can disable the cache for a build by using the --no-cache argument to the docker build command.

If you are building your Docker images on a system that is used for other simultaneous processes, you can limit the resources available to your builds by using many of the same cgroup methods that we will discuss in Chapter 5. You can find detailed documentation on the docker build arguments in the official documentation.

Troubleshooting Broken Builds

We normally expect builds to just work, especially when we’ve scripted them, but in the real world things go wrong. Let’s spend a little bit of time discussing what you can do to troubleshoot a Docker build that is failing.

We need a patient for the next exercise, so let’s create a failing build. To do that, edit the Dockerfile so that the line that reads:

RUN apt-get -y update

now reads:

RUN apt-get -y update-all

If you try to build the image now, you should get the following error:

$ docker build -t example/docker-node-hello:latest --no-cache .

Sending build context to Docker daemon  15.87kB
...
Step 6/14 : ENV SCPATH /etc/supervisor/conf.d
 ---> Running in 9c0a385269cf
Removing intermediate container 9c0a385269cf
 ---> 8a773166616c
Step 7/14 : RUN apt-get -y update-all
 ---> Running in cd57fc47503d
E: Command line option 'y' [from -y] is not known.
The command '/bin/sh -c apt-get -y update-all' returned a non-zero code: 100

This is a good example of a broken build, because the error is actually very misleading, and if we just assumed that -y was the problem, we would quickly discover that it was not. So, how can we troubleshoot this, especially if we are not developing on a Linux system? The real trick here is to remember that almost all Docker images are layered on top of other Docker images, and that you can start a container from any image. Although the meaning is not obvious on the surface, if you look at the output for step 6, you will see this:

Step 6/14 : ENV SCPATH /etc/supervisor/conf.d
 ---> Running in 9c0a385269cf
Removing intermediate container 9c0a385269cf
 ---> 8a773166616c

The first line that reads Running in 9c0a385269cf is telling you that the build process has started a new container, based on the image created in step 5. The next line, which reads Removing intermediate container 9c0a385269cf, is telling you that Docker is now removing the container, after having altered it based on the instruction in step 6. In this case, it was simply adding a default environment variable via ENV SCPATH /etc/supervisor/conf.d . The final line, which reads --→ 8a773166616c, is the one we really care about, because this is giving us the image ID for the image that was generated by step 6. You need this to troubleshoot the build, because it is the image from the last successful step in the build.

With this information, it is possible to run an interactive container so that you can try to determine why your build is not working properly. Remember that every container image is based on the image layers below it. One of the great benefits of that is that we can just run the lower layer as a container itself, using a shell to look around!

$ docker run --rm -ti 8a773166616c /bin/bash
root@464e8e35c784:/#

From inside the container, you can now run any commands that you might need to determine what is causing your build to fail and what you need to do to fix your Dockerfile.

root@464e8e35c784:/# apt-get -y update-all
E: Command line option 'y' [from -y] is not known.

root@464e8e35c784:/# apt-get update-all
E: Invalid operation update-all

root@464e8e35c784:/# apt-get --help
apt 1.0.9.8.3 for amd64 compiled on Mar 12 2016 13:31:17
Usage: apt-get [options] command
       apt-get [options] install|remove pkg1 [pkg2 ...]
       apt-get [options] source pkg1 [pkg2 ...]

apt-get is a simple command line interface for downloading and
installing packages. The most frequently used commands are update
and install.

Commands:
   update - Retrieve new lists of packages
...

Options:
...
  -y  Assume Yes to all queries and do not prompt
...

See the apt-get(8), sources.list(5) and apt.conf(5) manual
pages for more information and options.
                       This APT has Super Cow Powers.

root@464e8e35c784:/# apt-get -y update
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
...
Reading package lists... Done

root@464e8e35c784:/# exit
exit

Once the root cause has been determined, the Dockerfile can be fixed, so that RUN apt-get -y update-all now reads RUN apt-get -y update, and then rebuilding the image should result in success.

$ docker build -t example/docker-node-hello:latest .
Sending build context to Docker daemon  15.87kB
...
Successfully built 69f5e83bb86e
Successfully tagged example/docker-node-hello:latest

Running Your Image

Once you have successfully built the image, you can run it on your Docker host with the following command:

$ docker run -d -p 8080:8080 example/docker-node-hello:latest

The preceding command tells Docker to create a running container in the background from the image with the example/docker-node-hello:latest tag, and then map port 8080 in the container to port 8080 on the Docker host. If everything goes as expected, the new Node.js application should be running in a container on the host. You can verify this by running docker ps. To see the running application in action, you will need to open up a web browser and point it at port 8080 on the Docker host. You can usually determine the Docker host IP address by simply printing out the value of the DOCKER_HOST environment variable unless you are only running Docker locally, in which case 127.0.0.1 should work. Docker Machine users can also simply use docker-machine ip.

$ echo $DOCKER_HOST
tcp://127.0.0.1:2376

Get the IP address and enter something like http://127.0.0.1:8080/ (or your remote Docker address if it’s different than that) into your web browser address bar. You should see the following text:

Hello World. Wish you were here.
Warning

If you are running Docker CE locally, you may not have $DOCKER_HOST set and can just assume localhost or 127.0.0.1.

Environment Variables

If you read the index.js file, you will notice that part of the file refers to the variable $WHO, which the application uses to determine who the application is going to say Hello to:

var DEFAULT_WHO = "World";
var WHO = process.env.WHO || DEFAULT_WHO;

app.get('/', function (req, res) {
  res.send('Hello ' + WHO + '. Wish you were here.\n');
});

Let’s quickly cover how you can configure this application by passing in environment variables when you start it. First you need to stop the existing container using two commands. The first command will provide you with the container ID, which you will need to use in the second command:

$ docker ps
CONTAINER ID  IMAGE                             STATUS       ...
b7145e06083f  example/centos-node-hello:latest  Up 4 minutes ...
Note

You can format the output of docker ps by using a Go template so that you see only the information that you care about. In the preceding example you might decide to run something like docker ps --format "table {{.ID}}\t{{.Image}}\t{{.Status}}" to limit the output to the three fields you care about. Additionally, running docker ps --quiet with no format options will limit the output to only the container ID.

And then, using the container ID from the previous output, you can stop the running container by typing:

$ docker stop b7145e06083f
b7145e06083f

You can then restart the container by adding one argument to the previous docker run command:

$ docker run -d -p 8080:8080 -e WHO="Sean and Karl" \
    example/docker-node-hello:latest

If you reload your web browser, you should see that the text on the web page now reads:

Hello Sean and Karl. Wish you were here.

Custom Base Images

Base images are the lowest-level images that other Docker images will build upon. Most often, these are based on minimal installs of Linux distributions like Ubuntu, Fedora, CentOS, or Alpine Linux, but they can also be much smaller, containing a single statically compiled binary. For most people, using the official base images for their favorite distribution or tool is a great option.

However, there are times when it is more preferable to build your own base images rather than using an image created by someone else. One reason to do this would be to maintain a consistent OS image across all your deployment methods for hardware, VMs, and containers. Another would be to get the image size down substantially. There is no need to ship around an entire Ubuntu distribution, for example, if your application is a statically built C or Go application. You might find that you only need the tools you regularly use for debugging and some other shell commands and binaries. Making the effort to build such an image could pay off in better deployment times and easier application distribution.

A common middle ground between these two approaches is to build images using Alpine Linux, which is designed to be very small and is popular as a basis for Docker images. To keep the distribution size very small, Alpine Linux is based on the modern, lightweight musl standard library, instead of the more traditional GNU libc. In general, this is not a big issue, since many packages support musl, but it is something to be aware of. It has the largest impact on Java-based applications and DNS resolution. It’s widely used in production, however, because of its diminutive image size. It’s highly optimized for space, which is the reason that Alpine Linux ships with /bin/sh, instead of /bin/bash, by default. However, you can also install glibc and bash in Alpine Linux if you really need it, and this is often done in the case of JVM containers.

In the official Docker documentation, there is some good information about how you can build base images on the various Linux distributions.

Storing Images

Now that you have created a Docker image that you’re happy with, you’ll want to store it somewhere so that it can be easily accessed by any Docker host that you want to deploy it to. This is also the normal hand-off point between building images and storing them somewhere for future deployment. You don’t normally build the images on a production server and then run them. This process was described when we talked about handoff between teams for application deployment. Ordinarily, deployment is the process of pulling an image from a repository and running it on one or more Docker servers. There are a few ways you can go about storing your images into a central repository for easy retrieval.

Public Registries

Docker provides an image registry for public images that the community wants to share. These include official images for Linux distributions, ready-to-go WordPress containers, and much more.

If you have images that can be published to the internet, the best place for them is a public registry, like Docker Hub. However, there are other options. When the core Docker tools were first gaining popularity, Docker Hub did not exist. To fill this obvious void in the community, Quay.io was created. Since then, Quay.io was purchased by CoreOS3 and has been used to create the CoreOS Enterprise Registry product, which we will discuss in a moment. Cloud vendors like Google also have their own registries, and a number of third parties have also joined the fray. Here we’ll just talk about the two most popular.

Both Docker Hub and Quay.io provide centralized Docker image registries that can be accessed from anywhere on the internet, and provide a method to store private images in addition to public ones. Both have nice user interfaces and the ability to separate team access permissions and manage users. Both also offer reasonable commercial options for private SaaS hosting of your images, much in the same way that GitHub sells private registries on their systems. This is probably the right first step if you’re getting serious about Docker but are not yet shipping enough code to need an internally hosted solution.

Tip

As you become comfortable with Docker Hub, you may also want to consider investigating Docker Cloud, which adds a significant feature set on top of the Docker Hub registry.

For companies that use Docker heavily, one of the biggest downsides to these registries is that they are not local to the network on which the application is being deployed. This means that every layer of every deployment might need to be dragged across the internet in order to deploy an application. Internet latencies have a very real impact on software deployments, and outages that affect these registries could have a very detrimental impact on a company’s ability to deploy smoothly and on schedule. This is mitigated by good image design where you make thin layers that are easy to move around the internet.

Private Registries

The other option that many companies consider is to host some type of Docker image registry internally, which can interact with the Docker client to support pushing, pulling, and searching images. The version 1 registry, Docker Registry, has been replaced with the version 2 registry, called Docker Distribution.

Other strong contenders in the private registry space include the Docker Trusted Registry and the Quay Enterprise Registry. In addition to the basic Docker registry functionality, these products have solid GUI interfaces and many additional features, like image verification.

Authenticating to a Registry

Communicating with a registry that stores container images is a part of daily life with Docker. For many registries, this means you’ll need to authenticate to gain access to images. But Docker also tries to make it easy to automate things so it can store your login information and use it on your behalf when you request things like pulling down a private image. By default, Docker assumes the registry will be Docker Hub, the public repository hosted by Docker, Inc.

Creating a Docker Hub account

For these examples, you will create an account on Docker Hub. You don’t need an account to use publicly shared images, but you will need one to upload your own public or private containers.

To create your account, use your web browser of choice to navigate to Docker Hub.

From there, you can log in via an existing account or create a new login based on your email address. When you create your account, Docker Hub sends a verification email to the address that you provided during signup. You should immediately log in to your email account and click the verification link inside the email to finish the validation process.

At this point, you have created a public registry to which you can upload new images. The Settings option under your profile picture will allow you to change your registry into a private one if that is what you need.

Logging in to a registry

Now let’s log in to the Docker Hub registry using our account:

$ docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't
have a Docker ID, head over to https://hub.docker.com to create one.
Username: <hub-username>
Password: <hub-password>
Login Succeeded

When you get Login Succeeded back from the server, you know you’re ready to pull images from the registry. But what happened under the covers? It turns out that Docker has written a dotfile for us in our home directory to cache this information. The permissions are set to 0600 as a security precaution against other users reading your credentials. You can inspect the file with something like:

$ ls -la ~/.docker/config.json
-rw-------@ 1 ...  158 Dec 24 10:37 /Users/someuser/.docker/config.json
$ cat ~/.docker/config.json
{
    "auths": {
    "https://index.docker.io/v1/": {
      "auth":"cmVsaEXamPL3hElRmFCOUE=",
      "email":"someuser@example.com"
    }
  }
}
Note

Docker is constantly evolving and has added support for many OS native secret management systems like the macOS keychain. So, your config.json file might look significantly different than the example. There is also a set of credentials managers for different platforms that can make your life easier here.

Here you can see the .docker/config.json file, owned by someuser, and the stored credentials in JSON format. Note that this can support multiple registries at once. In this case, you just have one entry, for Docker Hub, but you could have more if you needed it. From now on, when the registry needs authentication, Docker will look in .docker/config.json to see if you have credentials stored for this hostname. If so, it will supply them. You will notice that one value is completely lacking here: a timestamp. These credentials are cached forever or until you tell Docker to remove them, whichever comes first.

As with logging in, you can also log out of a registry if you no longer want to cache the credentials:

$ docker logout
Removing login credentials for https://index.docker.io/v1/
$ cat ~/.docker/config.json
{
  "auths": {
  }
}

Here you have removed the cached credentials and they are no longer stored by Docker. Some versions of Docker may even remove this file if it is completely empty. If you were trying to log in to something other than the Docker Hub registry, you could supply the hostname on the command line:

$ docker login someregistry.example.com

This would then add another auth entry into our .docker/config.json file.

Pushing images into a repository

The first step required to push your image is to ensure that you are logged into the Docker repository you intend to use. For this example we will focus on Docker Hub, so ensure that you are logged into Docker Hub with your personal credentials.

$ docker login
Username: <your_user>
Password: <your_password>
Email: <your_email@your_domain.com>
Login Succeeded

Once you are logged in, you can upload an image. Earlier you used the command docker build -t example/docker-node-hello:latest . to build the docker-node-hello image. The example portion of that command refers to a repository. When this is local, it can be anything that you want. However, when you are going to upload your image to a real repository, you need that to match the login.

You can easily edit the tags on the image that you already created by running the following command:

$ docker tag example/docker-node-hello:latest \
    ${<myuser>}/docker-node-hello:latest

If you need to rebuild the image with the new naming convention or simply want to give it a try, you can accomplish this by running the following command in the docker-node-hello working directory that was generated when you performed the Git checkout earlier in the chapter.

Note

For the following examples, you will need to replace ${<myuser>} in all the examples with the user that you created in Docker Hub (or whatever repository you decided to use).

$ docker build -t ${<myuser>}/docker-node-hello:latest .
...

On the first build this will take a little time. If you rebuild the image, you may find that it is very fast. This is because most, if not all, of the layers already exist on your Docker server from the previous build. We can quickly verify that our image is indeed on the server by running docker image ls ${<myuser>}/docker-node-hello:

$ docker image ls ${<myuser>}/docker-node-hello
REPOSITORY                 TAG      IMAGE ID       CREATED             SIZE
myuser/docker-node-hello   latest   f683df27f02d   About an hour ago   649MB
Tip

It is possible to format the output of docker image ls to make it more concise by using the --format argument, like this: docker images --format="table {{.ID }}\t {{.Repository }}".

At this point you can upload the image to the Docker repository by using the docker push command:

$ docker push ${<myuser>}/docker-node-hello:latest
The push refers to a repository [docker.io/myuser/docker-node-hello]
f93b977faea8: Pushed
...
fe4c16cbf7a4: Mounted from library/node
latest: digest: sha256:4ccb6995f9470553fcaa85674455b30370f092a110... size: 2840

If this image was uploaded to a public repository, anyone in the world can now easily download it by running the docker pull command.

Tip

If you uploaded the image to a private repository, then users must log into the private repository using the docker login command before they will be able to pull the image down to their local system.

$ docker pull ${<myuser>}/docker-node-hello:latest
latest: Pulling from myuser/docker-node-hello
Digest: sha256:4ccb6995f9470553fcaa85674455b30370f092a11029f6c7648dc45776d6af06
Status: Image is up to date for myuser/docker-node-hello:latest

Running a Private Registry

In keeping with the spirit of the open source community, Docker encourages the community to share Docker images via Docker Hub by default. There are times, however, when this is not a viable option due to commerical, legal, or reliability concerns.

In these cases, it makes sense to host an internal private registry. Setting up a basic registry is not difficult, but for production use, you should take the time to familiarize yourself with all the available configuration options for Docker Distribution.

For this example, we are going to create a very simple secure registry using SSL and HTTP basic auth.

First let’s create a few directories and files on our Docker server. If you are using a virtual machine or cloud instance to run your Docker server, then you will need to SSH to that server for the next few commands. If you are using Docker Community Edition, then you should be able to run these on your local system.

Note

Windows users: You may need to download additional tools, like htppaswd, or alter the non-Docker commands to accomplish the same tasks on you local system.

First let’s clone a Git repository that contains the basic files required to set up a simple, authenticated Docker registry.

$ git clone https://github.com/spkane/basic-registry --config core.autocrlf=input
Cloning into 'basic-registry'...
remote: Counting objects: 10, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 10 (delta 0), reused 10 (delta 0), pack-reused 0
Unpacking objects: 100% (10/10), done.

Once you have the files locally, you can change directories and examine the files that you have just downloaded.

$ cd basic-registry
$ ls
Dockerfile          config.yml.sample   registry.crt.sample
README.md           htpasswd.sample     registry.key.sample

The Dockerfile simply takes the upstream registry image from Docker Hub and copies some local configuration and support files into a new image.

For testing, you can use some of the included sample files, but do not use these in production.

If your Docker server is available via localhost (127.0.0.1), then you can use these files unmodified by simply copying each of them like this:

$ cp config.yml.sample config.yml
$ cp registry.key.sample registry.key
$ cp registry.crt.sample registry.crt
$ cp htpasswd.sample htpasswd

If, however, your Docker server is on a remote IP address, then you will need to do a little additional work.

First copy config.yaml.sample to config.yaml.

$ cp config.yml.sample config.yml

Then edit config.yaml and replace 127.0.0.1 with the IP address of your Docker server so that:

http:
  host: https://127.0.0.1:5000

becomes something like this:

http:
  host: https://172.17.42.10:5000
Note

It is easy to create a registry using a fully qualified domain name (FQDN), like my-registry.example.com, but for this example working with IP addresses is easier because no DNS is required.

Next you need to create an SSL keypair for your registry’s IP address:

One way to do this is with the following OpenSSL command. Note that you will need to set the IP address in this portion of the command /CN=172.17.42.10 to match your Docker server’s IP address.

openssl req -x509 -nodes -sha256 -newkey rsa:4096 \
	-keyout registry.key -out registry.crt \
	-days 14 -subj '{/CN=172.17.42.10}'

Finally, you can either use the example htpasswd file by copying it:

$ cp htpasswd.sample htpasswd

or you can create your own username and password pair for authentication by using a command like the following, replacing ${<username>} and ${<password>} with your preferred values.

docker run --entrypoint htpasswd spkane/basic-registry:latest \
  -Bbn ${<username>} ${<password>} > htpasswd

If you look at the directory listing again, it should now look like this:

$ ls
Dockerfile          config.yml.sample   registry.crt        registry.key.sample
README.md           htpasswd            registry.crt.sample
config.yml          htpasswd.sample     registry.key

If any of these files are missing, review the previous steps, to ensure that you did not miss one, before moving on.

If everything looks correct, then you should be ready to build and run the registry.

$ docker build -t my-registry .
$ docker run -d -p 5000:5000 --name registry my-registry
$ docker logs registry
Tip

If you see errors like docker: Error response from daemon: Conflict. The container name "/registry" is already in use, then you need to either change the container name above, or remove the existing container with that name. You can remove the container by running docker rm registry.

Testing the private registry

Now that the registry is running, you can test it. The very first thing that you need to do is authenticate against it. You will need to make sure that the IP address in the docker login matches the IP address of your Docker server that is running the registry.

Note

myuser is the default username, and myuser-pw! is the default password. If you generated your own htpasswd, then these will be whatever you choose.

$ docker login 127.0.0.1:5000
Username: <registry-username>
Password: <registry-password>
Login Succeeded
Warning

This registry container is not using any external storage, which means that when you delete the running container, all your images will also be deleted. This is by design. In production you will want to use some type of redundant external storage, like an object store. If you want to keep your development registry images between containers, you could add something like -v /tmp/registry-data:/var/lib/registry to your docker run command to store the registry data on the Docker server.

Now, let’s see if you can push the image you just built into your local private registry.

Tip

In all of these commands, ensure that you use the correct IP address for your registry.

$ docker tag my-registry 127.0.0.1:5000/my-registry
$ docker push 127.0.0.1:5000/my-registry
The push refers to a repository [127.0.0.1:5000/my-registry]
3198e7f79e44: Pushed
...
52a5560f4ca0: Pushed
latest: digest: sha256:b34f35f1810a8b8b64e0eac729a0aa875ca3ccbb4... size: 2194

You can then try to pull the same image from your repository.

$ docker pull 127.0.0.1:5000/my-registry
Using default tag: latest
latest: Pulling from my-registry
Digest: sha256:b34f35f1810a8b8b64e0eac729038f43be2f7d2a0aa875ca3ccbb493f14ea159
Status: Image is up to date for 127.0.0.1:5000/my-registry:latest
Tip

It’s worth keeping in mind that both Docker Hub and Docker Distribution expose an API endpoint that you can query for useful information. You can find out more information about the API via the official documentation.

If you have not encountered any errors, then you have a working registry for development and could build on this foundation to create a production registry. At this point you may want to stop the registry for the time being. You can easily accomplish this by running:

$ docker stop registry
Tip

As you become comfortable with Docker Distribution, you may also want to consider exploring VMware’s open source project, called Harbor, which extends the Docker Distribution with a lot of security- and reliability-focused features.

Advanced Building Techniques

After you have spent a little bit of time working with Docker, you will quickly notice that keeping your image sizes small and your build times fast can be very beneficial in decreasing the time required to build and deploy new versions of your software into production. In this section we will talk a bit about some of the considerations you should always keep in mind when designing your images, and a few techniques that can help achieve these goals.

Keeping Images Small

In most modern businesses, downloading a single 1 GB file from a remote location on the internet is not something that people often worry about. It is so easy to find software on the internet that people will often rely on simply redownloading it if they need it again, instead of keeping a local copy for the future. This may often be acceptable, when you truly need a single copy of this software on a single server, but it can quickly become a scaling problem, when you need the same software on 100+ nodes and you deploy new releases multiple times a days. Downloading these large files can quickly cause network congestion and slower deployment cycles that have a real impact on the production environment.

For convenience, a large number of Docker containers inherit from a base image that contains a minimal Linux distribution. Although this is an easy starting place, it isn’t actually required. Containers only need to contain the files that are required to run the application on the host kernel, and nothing else. The best way to explain this is to explore a very minimal container.

Go is a compiled programming language that by default will generate statically compiled binary files. For this example we are going to use a very small web application written in Go that can be found on GitHub.

Let’s go ahead and try out the application, so that you can see what it does. Run the following command and then open up a web browser and point it to your Docker host on port 8080 (e.g., http://127.0.0.1:8080 for Docker Community Edition):

$ docker run -d -p 8080:8080 adejonge/helloworld

If all goes well, you should see the following message in your web browser: Hello World from Go in minimal Docker container. Now let’s take a look at what files this container comprises. It would be fair to assume that at a minimum it will include a working Linux environment and all the files required to compile Go programs, but you will soon see that this is not the case.

While the container is still running, execute the following command to determine what the container ID is. The following command returns the information for the last container that you created:

$ docker container ls -l
CONTAINER ID IMAGE               COMMAND       CREATED              ...
ddc3f61f311b adejonge/helloworld "/helloworld" 4 minutes ago        ...

You can then use the container ID that you obtained from running the previous command to export the files in the container into a tarball, which can be easily examined.

$ docker export ddc3f61f311b -o web-app.tar

Using the tar command, you can now examine the contents of your container at the time of the export.

tar -tvf web-app.tar
-rwxr-xr-x  0 0      0           0 Jan  7 15:54 .dockerenv
drwxr-xr-x  0 0      0           0 Jan  7 15:54 dev/
-rwxr-xr-x  0 0      0           0 Jan  7 15:54 dev/console
drwxr-xr-x  0 0      0           0 Jan  7 15:54 dev/pts/
drwxr-xr-x  0 0      0           0 Jan  7 15:54 dev/shm/
drwxr-xr-x  0 0      0           0 Jan  7 15:54 etc/
-rwxr-xr-x  0 0      0           0 Jan  7 15:54 etc/hostname
-rwxr-xr-x  0 0      0           0 Jan  7 15:54 etc/hosts
lrwxrwxrwx  0 0      0           0 Jan  7 15:54 etc/mtab -> /proc/mounts
-rwxr-xr-x  0 0      0           0 Jan  7 15:54 etc/resolv.conf
-rwxr-xr-x  0 0      0     3604416 Jul  2  2014 helloworld
drwxr-xr-x  0 0      0           0 Jan  7 15:54 proc/
drwxr-xr-x  0 0      0           0 Jan  7 15:54 sys/

The first thing you might notice here is that there are almost no files in this container, and almost all of them are zero bytes in length. All of the files that have a zero length are required to exist in every Linux container and are automatically bind-mounted from the host into the container when it is first created. All of these files, except for .dockerenv, are critical files that the kernel actually needs to do its job properly. The only file in this container that has any actual size and is related to our application is the statically compiled helloworld binary.

The take-away from this exercise is that your containers are only required to contain exactly what they need to run on the underlying kernel. Everything else is unnecessary. Because it is often useful for troubleshooting to have access to a working shell in your container, people will often compromise and build their images from a very lightweight Linux distribution like Alpine Linux.

To dive into this a little deeper, let’s look at that same container again so that we can dig into the underlying filesystem and compare it with the popular alpine base image.

Although we could easily poke around in the alpine image by simply running docker run -ti alpine:latest /bin/sh, we cannot do this with the adejonge/helloworld image, because it does not contain a shell or SSH. This means that we can’t use ssh, nsenter, or docker exec to examine it. Earlier, we took advantage of the docker export command to create a .tar file that contained a copy of all the files in the container, but this time around we are going to examine the container’s filesystem by connecting directly to the Docker server, and then looking into the container’s filesystem itself. To do this, we need to find out where the image files reside on the server’s disk.

To determine where on the server our files are actually being stored, run docker image inspect on the alpine:latest image:

$ docker image inspect alpine:latest
[
    {
        "Id": "sha256:3fd...353",
        "RepoTags": [
            "alpine:latest"
        ],
        "RepoDigests": [
            "alpine@sha256:7b8...f8b"
        ],
...
        "GraphDriver": {
            "Data": {
                "MergedDir":
                "/var/lib/docker/overlay2/ea8...13a/merged",
                "UpperDir":
                "/var/lib/docker/overlay2/ea8...13a/diff",
                "WorkDir":
                "/var/lib/docker/overlay2/ea8...13a/work"
            },
            "Name": "overlay2"
...
]

And then on the adejonge/helloworld:latest image:

$ docker image inspect adejonge/helloworld:latest
[
    {
        "Id": "sha256:4fa...06d",
        "RepoTags": [
            "adejonge/helloworld:latest"
        ],
        "RepoDigests": [
            "adejonge/helloworld@sha256:46d...a1d"
        ],
...
        "GraphDriver": {
            "Data": {
                "LowerDir":
                "/var/lib/docker/overlay2/37a...84d/diff:
                /var/lib/docker/overlay2/28d...ef4/diff",
                "MergedDir":
                "/var/lib/docker/overlay2/fc9...c91/merged",
                "UpperDir":
                "/var/lib/docker/overlay2/fc9...c91/diff",
                "WorkDir":
                "/var/lib/docker/overlay2/fc9...c91/work"
            },
            "Name": "overlay2"
...
]
Note

In this particular example we are going to use Docker Community Edition running on macOS, but this general approach will work on most modern Docker servers. You’ll just need to access your Docker server via whatever method is easiest.

Since we are using Docker Community Edition, we need to use our nsenter trick to enter the SSH-less virtual machine and explore the filesystem.

$ docker run -it --privileged --pid=host debian nsenter -t 1  -m -u -n -i sh

/ #

Inside the VM, we should now be able to explore the various directories listed in the GraphDriver section of the docker image inspect commands.

In this example, if we look at the first entry for the alpine image will see that it is labeled MergedDir and lists the folder /var/lib/docker/overlay2/ea86408b2b15d33ee27d78ff44f82104705286221f055ba1331b58673f4b313a/merged. If we list that directory we will actually get an error, but from listing the parent directory we quickly discover that we actually want to look at the diff directory.

/ # ls -lFa /var/lib/docker/overlay2/ea...3a/merged

ls: /var/lib/docker/overlay2/ea..3a/merged: No such file or directory

/ # ls -lF /var/lib/docker/overlay2/ea...3a/

total 8
drwxr-xr-x   18 root     root          4096 Mar 15 19:27 diff/
-rw-r--r--    1 root     root            26 Mar 15 19:27 link

/ # ls -lF /var/lib/docker/overlay2/ea...3a/diff

total 64
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 bin/
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 dev/
drwxr-xr-x   15 root     root          4096 Jan  9 19:37 etc/
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 home/
drwxr-xr-x    5 root     root          4096 Jan  9 19:37 lib/
drwxr-xr-x    5 root     root          4096 Jan  9 19:37 media/
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 mnt/
dr-xr-xr-x    2 root     root          4096 Jan  9 19:37 proc/
drwx------    2 root     root          4096 Jan  9 19:37 root/
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 run/
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 sbin/
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 srv/
drwxr-xr-x    2 root     root          4096 Jan  9 19:37 sys/
drwxrwxrwt    2 root     root          4096 Jan  9 19:37 tmp/
drwxr-xr-x    7 root     root          4096 Jan  9 19:37 usr/
drwxr-xr-x   11 root     root          4096 Jan  9 19:37 var/

/ # du -sh  /var/lib/docker/overlay2/ea...3a/diff
4.5M    /var/lib/docker/overlay2/ea...3a/diff

Now alpine happens to be a very small base image, weighing in at only 4.5 MB, and it is actually ideal for building containers on top of. However, we can see that there is still a lot of stuff in this container before we have started to build anything from it.

Now, let’s take a look at the files in the adejonge/helloworld image. In this case, we want to look at the first directory from the LowerDir entry of the docker image inspect output, which you’ll notice also ends in a directory called diff.

/ # ls -lFh /var/lib/docker/overlay2/37...4d/diff

total 3520
-rwxr-xr-x    1 root     root        3.4M Jul  2  2014 helloworld*

/ # exit

You’ll notice that there is only a single file in this directory and it is 3.4 MB. This helloworld binary is the only file shipped in this container and is smaller than the starting size of the alpine image before any application files have been added to it.

Note

It is actually possible to run the helloworld application right from that directory on your Docker server, because it does not require any other files. You really don’t want to do this on anything but a development box, but it can help drive the point home about how useful these types of statically compiled applications can be.

Multistage builds

There is a way you can constrain containers to an even smaller size in many cases: multistage builds. This is, in fact, how we recommend that you build most production containers. You don’t have to worry as much about bringing in extra resources to build your application, and can still run a lean production container. Multistage containers also encourage doing builds inside of Docker, which is a great pattern for repeatability in your build system.

As the author of this helloworld application has written about, the release of multistage build support in Docker itself has made the process of creating small containers much easier than it used to be. In the past, to do the same thing that multistage delivers for nearly free, you were required to build one image that compiled your code, extract the resulting binary, and then build a second image without all the build dependencies that you would then inject that binary into. This was often difficult to set up and did not always work out-of-the-box with standard deployment pipelines.

Today, you can now achieve similar results using a Dockerfile as simple as this one:

# Build container
FROM golang:alpine as builder
RUN apk update && \
    apk add git && \
    CGO_ENABLED=0 go get -a -ldflags '-s' github.com/adriaandejonge/helloworld

# Production container
FROM scratch
COPY --from=builder /go/bin/helloworld /helloworld
EXPOSE 8080
CMD ["/helloworld"]

The first thing you’ll notice about this Dockerfile is that it looks a lot like two files that have been combined into one. Indeed this is the case, but there is more to it. The FROM command has been extended, so that you can name the image during the build phase. In this example, the first line, which reads FROM golang as builder, means that you want to base your build on the golang image and will be referring to this build image/stage as builder.

On the fourth line, you’ll see another FROM line, which was not allowed before the introduction of multistage builds. This FROM line uses a special image name, called scratch, that tells Docker to start from an empty image, which includes no additional files. The next line, which reads COPY --from=builder /go/bin/helloworld /helloworld, allows you to copy the binary that you built in the builder image directly into the current image. This will ensure that you end up with the smallest container possible.

Let’s try to build this and see what happens. First, create a directory where you can work and then, using your favorite text editor, paste the content from the preceding example into a file called Dockerfile.

$ mkdir /tmp/multi-build
$ cd /tmp/multi-build
$ vi Dockerfile

We can now start the multistage build.

$ docker build .
Sending build context to Docker daemon  2.048kB
Step 1/6 : FROM golang:alpine as builder
 ---> f421e93ece9c
Step 2/6 : RUN apk update &&     apk add git &&     CGO_ENABLED=0 go get -a \
               -ldflags '-s' github.com/adriaandejonge/helloworld
 ---> Running in 8472fecfb61e
...
OK: 25 MiB in 17 packages
 ---> 074c432f8d99
Removing intermediate container 8472fecfb61e
Step 3/6 : FROM scratch
 --->
Step 4/6 : COPY --from=builder /go/bin/helloworld /helloworld
 ---> 0d995eddb962
Step 5/6 : EXPOSE 8080
 ---> Running in 544d8feedcde
 ---> 0421d47b8884
Removing intermediate container 544d8feedcde
Step 6/6 : CMD /helloworld
 ---> Running in f5284d729d8a
 ---> 8970a3ac4276
Removing intermediate container f5284d729d8a
Successfully built 8970a3ac4276

You’ll notice that the output looks like most other builds and still ends by reporting the successful creation of our final, very minimal image.

Tip

You are not limited to two stages, and in fact, none of the stages need to even be related to each other. They will be run in order. You could, for example, have a stage based on the public Go image that builds your underlying Go application to serve an API, and another stage based on the Angular container to build your frontend web UI. The final stage could then combine outputs from both. If compiling binaries with shared libs, you need to be careful about the underlying OS you are compiling on versus the one you will ship in the container. Alpine Linux is rapidly becoming the common ground here.

Layers Are Additive

Something that is not apparent until you dig much deeper into how images are built is that the filesystem layers that make up your images are strictly additive in nature. Although you can mask files in previous layers, you cannot delete those files. In practice this means that you cannot make your image smaller by simply deleting files that were generated in earlier steps.

The easiest way to explain this is by using some practical examples. In a new directory, create the following file, which will generate a image that launches the Apache web server running on Fedora Linux:

FROM fedora
RUN dnf install -y httpd
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]

and then build it like this:

$ docker build .
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM fedora
 ---> 422dc563ca32
Step 2/3 : RUN dnf install -y httpd
 ---> Using cache
 ---> 44a66eb9f002
Step 3/3 : CMD /usr/sbin/httpd -DFOREGROUND
 ---> Running in 9888e6524355
 ---> 8d29ec43dc5a
Removing intermediate container 9888e6524355
Successfully built 8d29ec43dc5a

Let’s go ahead and tag the resulting image so that you can easily refer to it in subsequent commands:

$ docker tag 8d29ec43dc5a size1

Now let’s take a look at our image with the docker history command. This command will give us some insight into the filesystem layers and build steps that our image uses.

$ docker history size1
IMAGE        CREATED       CREATED BY                                    SIZE
8d29ec43dc5a 5 minutes ago /bin/sh -c #(nop)  CMD ["/usr/sbin/httpd"...  0B
44a66eb9f002 6 minutes ago /bin/sh -c dnf install -y httpd               182MB
422dc563ca32 7 weeks ago   /bin/sh -c #(nop) ADD file:3b33f77d83b76f3... 252MB
<missing>    7 weeks ago   /bin/sh -c #(nop)  ENV DISTTAG=f27containe... 0B
<missing>    2 months ago  /bin/sh -c #(nop)  MAINTAINER [Adam Miller... 0B

You’ll notice that three of the layers added no size to our final image, but two of them added a great deal of size. The layer that is 252 MB makes sense, as this is the base Fedora image that includes a minimal Linux distribution; however, the 182 MB layer is surprising. The Apache web server shouldn’t be nearly that large, so what’s going on here, exactly?

If you have experience with package managers like apk, apt, dnf, or yum, then you may know that most of these tools rely heavily on a large cache that includes details about all the packages that are available for installation on the platform in question. This cache uses up a huge amount of space and is completely useless once you have installed the packages you need. The most obvious next step is to simply delete the cache. On Fedora systems, you could do this by editing your Dockerfile so that it looks like this:

FROM fedora
RUN dnf install -y httpd
RUN dnf clean all
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]

and then building, tagging, and examining the resulting image:

$ docker build .
Sending build context to Docker daemon  2.048kB
...
Successfully built 6ad08c11b3f5
$ docker tag 6ad08c11b3f5 size2
$ docker history size2
IMAGE        CREATED        CREATED BY                                   SIZE
6ad08c11b3f5 About a minute /bin/sh -c #(nop)  CMD ["/usr/sbin/httpd...  0B
29b7f0acbb91 About a minute /bin/sh -c dnf clean all                     1.84MB
44a66eb9f002 24 minutes ago /bin/sh -c dnf install -y httpd              182MB
422dc563ca32 7 weeks ago    /bin/sh -c #(nop) ADD file:3b33f77d83b76f... 252MB
<missing>    7 weeks ago    /bin/sh -c #(nop)  ENV DISTTAG=f27contain... 0B
<missing>    2 months ago   /bin/sh -c #(nop)  MAINTAINER [Adam Mille... 0B

If you look carefully at the output from the docker history command, you’ll notice that you have created a new layer that adds 1.84 MB to the image, but you have not decreased the size of the problematic layer at all. What is happening exactly?

The important thing to understand is that image layers are strictly additive in nature. Once a layer is created, nothing can be removed from it. This means that you cannot make earlier layers in an image smaller by deleting files in subsequent layers. When you delete or edit files in subsequent layers, you’re simply masking the older version with the modified or removed version in the new layer. This means that the only way you can make a layer smaller is by removing files before you save the layer.

The most common way to deal with this is by stringing commands together on a single Dockerfile line. You can do this very easily by taking advantage of the && operator. This operator acts like a Boolean AND statement and basically translates into English as “and if the previous command ran successfully, run this command.” In addition to this, you can also take advantage of the / operator, which is used to indicate that a command continues after the newline. This can help improve the readability of long commands.

With this knowledge in hand, you can rewrite the Dockerfile like this:

FROM fedora
RUN dnf install -y httpd && \
    dnf clean all
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]

Now you can rebuild the image and see how this change has impacted the size of the layer that includes the http daemon:

$ docker build .
Sending build context to Docker daemon  2.048kB
...
Successfully built b064ee5618c9
$ docker tag b064ee5618c9 size3
$ docker history size3
IMAGE        CREATED        CREATED BY                                   SIZE
b064ee5618c9 52 seconds ago /bin/sh -c #(nop)  CMD ["/usr/sbin/http...   0B
7eb37b2de3cf 53 seconds ago /bin/sh -c dnf install -y httpd &&     ...   17.7MB
422dc563ca32 2 months ago   /bin/sh -c #(nop) ADD file:3b33f77d83b7...   252MB
<missing>    2 months ago   /bin/sh -c #(nop)  ENV DISTTAG=f27conta...   0B
<missing>    2 months ago   /bin/sh -c #(nop)  MAINTAINER [Adam Mil...   0B

In the first two examples the layer in question was 182 MB in size, but now that you have removed many unnecessary files that were added to that layer, you are able to shrink the layer all the way down to 17.7 MB. This is a very large saving of space, especially when you consider how many servers might be pulling the image down during any given deployment. Of course, 252 MB is still pretty big for a production base layer, but hopefully you get the idea.

Optimizing for the Cache

The final building technique that we will cover here is related to keeping build times as fast as possible. One of the important goals of the DevOps movement is to keep feedback loops as tight as possible. This means that it is important to try to ensure that problems are discovered and reported as quickly as possible so that they can be fixed when people are still completely focused on the code in question and haven’t moved on to other unrelated tasks.

During any standard build process, Docker uses a layer cache to try to avoid rebuilding any image layers that it has already built and that do not contain any noticeable changes. Because of this cache, the order in which you do things inside your Dockerfile can have a dramatic impact on how long your builds take on average.

For starters let’s take the Dockerfile from the previous example and customize it just a bit, so that it looks like this:

FROM fedora
RUN dnf install -y httpd && \
    dnf clean all
RUN mkdir /var/www && \
    mkdir /var/www/html
ADD index.html /var/www/html
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]

Now, in the same directory, let’s also create a new file called index.html that looks like this:

<html>
  <head>
    <title>My custom Web Site</title>
  </head>
  <body>
    <p>Welcome to my custom Web Site</p>
  </body>
</html>

For the first test, let’s time the build without using the Docker cache at all, by using the following command:

$ time docker build --no-cache .
Sending build context to Docker daemon  3.072kB
...
Step 2/5 : RUN dnf install -y httpd &&     dnf clean all
...
Step 3/5 : RUN mkdir /var/www &&     mkdir /var/www/html
 ---> Running in 3bfc31e235da
Removing intermediate container 3bfc31e235da
 ---> a21db49f0442
Step 4/5 : ADD index.html /var/www/html
 ---> ff677486fe44
...
Successfully built 8798731d106c

real 1m7.611s
user 0m0.088s
sys  0m0.074s
Tip

Windows users should be able to use the PowerShell Measure-Command function to replace the Unix time command used in these examples.

The output from the time command tells us that the build without the cache took about one minute and seven seconds. If you rebuild the image immediately afterward, and allow Docker to use the cache, you will see that the build is very fast.

time docker build .
Sending build context to Docker daemon  3.072kB
Step 1/5 : FROM fedora
 ---> 422dc563ca32
Step 2/5 : RUN dnf install -y httpd &&     dnf clean all
 ---> Using cache
 ---> 88ae32ca622c
Step 3/5 : RUN mkdir /var/www &&     mkdir /var/www/html
 ---> Using cache
 ---> c3dc1fc9eb8b
Step 4/5 : ADD index.html /var/www/html
 ---> Using cache
 ---> d3dcfe6bc6d6
Step 5/5 : CMD ["/usr/sbin/httpd", "-DFOREGROUND"]
 ---> Using cache
 ---> 1974815809c4
Successfully built 1974815809c4

real 0m0.266s
user 0m0.051s
sys  0m0.052s

Since none of the layers changed and the cache could be fully leveraged, the build took only a fraction of a second to complete. Now, let’s make a small improvement to the index.html file so that it looks like this:

<html>
  <head>
    <title>My custom Web Site</title>
  </head>
  <body>
    <div align="center">
      <p>Welcome to my custom Web Site!!!</p>
    </div>
  </body>
</html>

and then let’s time the rebuild again:

$ time docker build .
Sending build context to Docker daemon  3.072kB
Step 1/5 : FROM fedora
 ---> 422dc563ca32
Step 2/5 : RUN dnf install -y httpd &&     dnf clean all
 ---> Using cache
 ---> 88ae32ca622c
Step 3/5 : RUN mkdir /var/www &&     mkdir /var/www/html
 ---> Using cache
 ---> c3dc1fc9eb8b
Step 4/5 : ADD index.html /var/www/html
 ---> dec032325309
Step 5/5 : CMD ["/usr/sbin/httpd", "-DFOREGROUND"]
 ---> Running in ea4e4d246cda
Removing intermediate container ea4e4d246cda
 ---> 59f4f67cd756
Successfully built 59f4f67cd756

real  0m0.933s
user  0m0.050s
sys  0m0.048s

If you look at the output carefully, you will see that the cache was used for most of the build. It wasn’t until Docker step 4, when Docker needed to copy index.html, that the cache was invalidated and the layers had to be recreated. Because the cache could be used for most of the build, the build still did not exceed a second.

But what would happen if you changed the order of the commands in the Dockerfile so that they looked like this:

FROM fedora
RUN mkdir /var/www && \
    mkdir /var/www/html
ADD index.html /var/www/html
RUN dnf install -y httpd && \
    dnf clean all
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]

Let’s quickly time another test build without the cache to get a baseline:

$ time docker build --no-cache .
Sending build context to Docker daemon  3.072kB
...
Successfully built c949ed0f036e

real 0m48.824s
user 0m0.076s
sys  0m0.066s

In this case, the build took 48 seconds to complete. The difference in time from the very first test is entirely due to fluctuating network speeds and has nothing to do with the changes that you have made to the Dockerfile.

Now, let’s edit index.html again like so:

<html>
  <head>
    <title>My custom Web Site</title>
  </head>
  <body>
    <div align="center" style="font-size:180%">
      <p>Welcome to my custom Web Site</p>
    </div>
  </body>
</html>

And now, let’s time the image rebuild, while using the cache:

$ time docker build .
Sending build context to Docker daemon  3.072kB
Step 1/5 : FROM fedora
 ---> 422dc563ca32
Step 2/5 : RUN mkdir /var/www &&     mkdir /var/www/html
 ---> Using cache
 ---> 04bee02fe0ba
Step 3/5 : ADD index.html /var/www/html
 ---> acb3cb2e9cee
Step 4/5 : RUN dnf install -y httpd &&     dnf clean all
 ---> Running in cc569be95c12
...
Successfully built 6597845f8514

real 0m48.511s
user 0m0.077s
sys  0m0.066s

The first time that you rebuilt the image, after editing the index.html file, it took only .933 seconds, but this time it took 48.511 seconds, almost exactly as long as it took to build the whole image without using the cache at all.

This is because you have modified the Dockerfile so that the index.html file is copied into the image very early in the process. The problem with doing it this way is that the index.html file changes frequently and will often invalidate the cache. The other issue it that it is unnecessarily placed before a very time-consuming step in our Dockerfile: installing the Apache web server.

The important lesson to take away from all of this is that order matters, and in general, you should always try to order your Dockerfile so that the most stable and time-consuming portions of your build process happen first and your code is added as late in the process as possible.

For projects that require you to install dependencies based on your code using tools like npm and bundle, it is also a good idea to do some research about optimizing your Docker builds for those platforms. This often includes locking down your dependency versions and storing them along with your code so that they do not need to be downloaded for each and every build.

Wrap-Up

At this point you should feel comfortable with the basic day-to-day activities around image creation for Docker. In the next chapter, we will start to dig into how you can use your images to create containerized processes for your projects.

1 Don’t Repeat Yourself.

2 This code was forked from GitHub.

3 Which in turn has been purchased by Red Hat.

Get Docker: Up & Running, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.