Chapter 4. Orchestration

With the cattle approach to managing infrastructure, you don’t manually allocate certain machines for running an application. Instead, you leave it up to an orchestrator to manage the life cycle of your containers. In Figure 4-1, you can see that container orchestration includes a range of functions, including but not limited to:

  • Organizational primitives, such as labels in Kubernetes, to query and group containers

  • Scheduling of containers to run on a host

  • Automated health checks to determine if a container is alive and ready to serve traffic and to relaunch it if necessary

  • Autoscaling (that is, increasing or decreasing the number of containers based on utilization or higher-level metrics)

  • Upgrade strategies, from rolling updates to more sophisticated techniques such as A/B and canary deployments

  • Service discovery to determine which host a scheduled container ended upon, usually including DNS support

Orchestration and it constituents.
Figure 4-1. Orchestration and its constituents

Sometimes considered part of orchestration but outside the scope of this book is the topic of base provisioning—that is, installing or upgrading the local operating system on a node or setting up the container runtime there.

Service discovery (covered in greater detail in Chapter 5) and scheduling are really two sides of the same coin. The scheduler decides where in a cluster a container is placed and supplies other parts with an up-to-date mapping in the form containers -> locations. This mapping can then be represented in various ways, be it in a distributed key-value store such as etcd, via DNS, or through environment variables.

In this chapter we will discuss networking and service discovery from the point of view of the following container orchestration solutions: Docker Swarm and swarm mode, Apache Mesos, and HashiCorp Nomad. These three are (along with Kubernetes, which we will cover in detail in Chapter 7) alternatives your organization may already be using, and hence, for the sake of completeness, it’s worth exploring them here. To make it clear, though, as of early 2018 the industry has standardized on Kubernetes as the portable way of doing container orchestration.

Note

In addition to the three orchestrators discussed in this chapter, there are other (closed source) solutions out there you could have a look at, including Facebook’s Bistro or hosted solutions such as Amazon ECS.

Should you want to more fully explore the topic of distributed system scheduling, I suggest reading Google’s research papers on Borg and Omega.

Before we dive into container orchestration systems, though, let’s step back and review what the scheduler—which is the core component of orchestration—actually does in the context of containerized workloads.

What Does a Scheduler Actually Do?

A scheduler for a distributed system takes an application—binary or container image—as requested by a user and places it on one or more of the available hosts. For example, a user might request to have 100 instances of the app running, so the scheduler needs to find space (CPU, RAM) to run these 100 instances on the available hosts.

In the case of a containerized setup, this means that the respective container image must exist on a host (if not, it must be pulled from a container registry first), and the scheduler must instruct the container runtime on that host to launch a container based on the image.

Let’s look at a concrete example. In Figure 4-2, you can see that the user requested three instances of the app running in the cluster. The scheduler decides the placement based on its knowledge of the state of the cluster. The cluster state may include the utilization of the machines, the resources necessary to successfully launch the app, and constraints such as launch this app only on a machine that is SSD-backed.

Further, quality of service might be taken into account for the placement decision; see Michael Gasch’s great article “QoS, Node allocatable and the Kubernetes Scheduler” for more details.

Distributed system scheduler in action.
Figure 4-2. Distributed system scheduler in action

If you want to learn more about scheduling in distributed systems I suggest you check out the excellent resource “Cluster Management at Google” by John Wilkes.

Warning

Beware of the semantics of constraints that you can place on scheduling containers. For example, I once gave a demo using Marathon that wouldn’t work as planned because I screwed up the placement constraints: I used a combination of unique hostname and a certain role, and it wouldn’t scale because there was only one node with the specified role in the cluster. The same thing can happen with Kubernetes labels.

Docker

Docker at the time of writing uses the so-called swarm mode in a distributed setting, whereas previous to Docker 1.12 the standalone Docker Swarm model was used. We will discuss both here.

Swarm Mode

Since Docker 1.12, swarm mode has been integrated with Docker Engine. The orchestration features embedded in Docker Engine are built using SwarmKit.

A swarm in Docker consists of multiple hosts running in swarm mode and acting as managers and workers—hosts can be managers, workers, or perform both roles at once. A task is a running container that is part of a swarm service and managed by a swarm manager, as opposed to a standalone container. A service in the context of Docker swarm mode is a definition of the tasks to execute on the manager or worker nodes. Docker works to maintain that desired state; for example, if a worker node becomes unavailable, Docker schedules the tasks onto another host.

Docker running in swarm mode doesn’t prevent you from running standalone containers on any of the hosts participating in the swarm. The essential difference between standalone containers and swarm services is that only swarm managers can manage a swarm, while standalone containers can be started on any host.

To learn more about Docker’s swarm mode, check out the official “Getting Started with Swarm Mode” tutorial or check out the Katacoda “Docker Orchestration – Getting Started with Swarm Mode” scenario.

Docker Swarm

Docker historically had a native clustering tool called Docker Swarm. Docker Swarm builds upon the Docker API1 and works as follows: there’s one Swarm manager that’s responsible for scheduling, and on each host an agent runs that takes care of the local resource management (Figure 4-3).

Docker Swarm architecture.
Figure 4-3. Docker Swarm architecture, based on the T-Labs presentation “Swarm – A Docker Clustering System”

Docker Swarm supports different backends: etcd, Consul, and ZooKeeper. You can also use a static file to capture your cluster state with Swarm, and recently a DNS-based service discovery tool for Swarm called wagl has been introduced.

Note

Out of the box, Docker provides a basic service discovery mechanism for single-node deployments called Docker links. Linking allows a user to let any container discover both the IP address and exposed ports of other Docker containers on the same host. In order to accomplish this, Docker provides the --link flag. But hard-wiring of links between containers is neither fun nor scalable. In fact, it’s so bad that this feature has been deprecated.

Apache Mesos

Apache Mesos (Figure 4-4) is a general-purpose cluster resource manager that abstracts the resources of a cluster (CPU, RAM, etc.) in such a way that the cluster appears like one giant computer to the developer. In a sense, Mesos acts like the kernel of a distributed operating system. It is hence never used on its own, but always together with so-called frameworks such as Marathon (for long-running stuff like a web server) or Chronos (for batch jobs), or big data and fast data frameworks like Apache Spark or Apache Cassandra.

Apache Mesos architecture at a glance.
Figure 4-4. Apache Mesos architecture at a glance

Mesos supports both containerized workloads (that is, running Docker containers) and plain executables (for example, bash scripts or Linux ELF format binaries for both stateless and stateful services).

In the following discussion, I’m assuming you’re familiar with Mesos and its terminology. If you’re new to Mesos, I suggest checking out David Greenberg’s wonderful book Building Applications on Mesos (O’Reilly), a gentle introduction to this topic that’s particularly useful for distributed application developers.

The networking characteristics and capabilities mainly depend on the Mesos containerizer used:

  • For the Mesos containerizer there are a few prerequisites, such as having a Linux Kernel version > 3.16 and libnl installed. You can then build a Mesos agent with network isolator support enabled. At launch, you would use something like the following:

    $mesos-slave --containerizer=mesos
     --isolation=network/port_mapping
     --resources=ports:[31000-32000];ephemeral_ports:[33000-35000]

    This would configure the Mesos agent to use nonephemeral ports in the range from 31,000 to 32,000 and ephemeral ports in the range from 33,000 to 35,000. All containers share the host’s IP, and the port ranges are spread over the containers (with a 1:1 mapping between destination port and container ID). With the network isolator, you also can define performance limitations such as bandwidth, and it enables you to perform per-container monitoring of the network traffic. See Jie Yu’s MesosCon 2015 talk “Per Container Network Monitoring and Isolation in Mesos” for more details on this topic.

  • For the Docker containerizer, see Chapter 2.

Note that Mesos supports IP-per-container since version 0.23. If you want to learn more about Mesos networking check out Christos Kozyrakis and Spike Curtis’s “Mesos Networking” talk from MesosCon 2015.

While Mesos is not opinionated about service discovery, there is a Mesos-specific solution that is often used in practice: Mesos-DNS (see “Pure-Play DNS-Based Solutions”). There are also a multitude of emerging solutions, such as traefik (see “Wrapping It Up”) that are integrated with Mesos and gaining traction.

Note

Because Mesos-DNS is the recommended default service discovery mechanism with Mesos, it’s important to pay attention to how Mesos-DNS represents tasks. For example, a running task might have the (logical) service name webserver.marathon.mesos, and you can find out the port allocations via DNS SRV records.

If you want to try out Mesos online for free you can use the Katacoda “Deploying Containers to DC/OS” scenario.

Hashicorp Nomad

Nomad is a cluster scheduler by HashiCorp, the makers of Vagrant. It was introduced in September 2015 and primarily aims at simplicity. The main idea is that Nomad is easy to install and use. Its scheduler design is reportedly inspired by Google’s Omega, borrowing concepts such as having a global state of the cluster as well as employing an optimistic, concurrent scheduler.

Nomad has an agent-based architecture with a single binary that can take on different roles, supporting rolling upgrades as well as draining nodes for re-balancing. Nomad makes use of both a consensus protocol (strongly consistent) for all state replication and scheduling and a gossip protocol used to manage the addresses of servers for automatic clustering and multiregion federation. In Figure 4-5, you can see Nomad’s architecture:

  • Servers are responsible for accepting jobs from users, managing clients, and computing task placements.

  • Clients (one per VM instance) are responsible for interacting with the tasks or applications contained within a job. They work in a pull-based manner; that is, they register with the server and then they poll it periodically to watch for pending work.

Nomad architecture.
Figure 4-5. Nomad architecture

Jobs in Nomad are defined in a HashiCorp-proprietary format called HCL or in JSON, and Nomad offers a command-line interface as well as an HTTP API to interact with the server process. Nomad models infrastructure as regions and datacenters. Regions may contain multiple datacenters, depending on what scale you are operating at. You can think of a datacenter like a zone in AWS, Azure, or Google Cloud (say, us-central1-b), and a region might be something like Iowa (us-central1).

I’m assuming you’re familiar with Nomad and its terminology. If not, I suggest you watch “Nomad: A Distributed, Optimistically Concurrent Schedule: Armon Dadgar, HashiCorp”, a nice introduction to Nomad, and also read the well-done docs.

To try out Nomad, use the UI Demo HashiCorp provides or try it out online for free using the Katacoda “Introduction to Nomad” scenario.

Nomad comes with a couple of so-called task drivers, from general-purpose exec to java to qemu and docker. For the docker driver Nomad requires, at the time of this writing, Docker version 1.10 or greater and uses port binding to expose services running in containers using the port space on the host’s interface. It provides automatic and manual mapping schemes for Docker, binding both TCP and UDP protocols to ports used for Docker containers.

For more details on networking options, such as mapping ports and using labels, see the documentation.

With v0.2, Nomad introduced a Consul-based (see “Consul”) service discovery mechanism. It includes health checks and assumes that tasks running inside Nomad also need to be able to connect to the Consul agent, which can, in the context of containers using bridge mode networking, pose a challenge.

Community Matters

An important aspect you’ll want to consider when selecting an orchestration system is the community behind and around it.2 Here are a few indicators and metrics you can use:

  • Is the governance backed by a formal entity or process, such as the Apache Software Foundation (ASF) or the Linux Foundation (LF)?

  • How active are the mailing list, the IRC channel, the bug/issue tracker, the Git repo (number of patches or pull requests), and other community initiatives?

    Take a holistic view, but make sure that you actually pay attention to the activity there. Healthy and hopefully growing communities will tend to have high participation in at least one of these areas.

  • Is the orchestration tool (implicitly) controlled by a single entity? For example, in the case of Nomad HashiCorp is in control, for Apache Mesos it’s mainly Mesosphere (and to some extent Twitter), etc.

  • Are multiple independent providers and support channels available? For example, you can run Kubernetes in many different environments and get help from many (commercial) organizations as well as individuals on Slack, mailing lists, or forums.

Wrapping It Up

As of early 2018, Kubernetes (discussed in Chapter 7) can be considered the de facto container orchestration standard. All major providers, including Docker and DC/OS (Mesos), support Kubernetes.

Next, we’ll move on to service discovery, a vital part of container orchestration.

1 Essentially, this means that you can simply keep using docker run commands and the deployment of your containers in a cluster happens automagically.

2 Now, you might argue that this is not specific to the container orchestration domain but a general OSS issue, and you’d be right. Still, I believe it is important enough to mention it, as many people are new to this area and can benefit from these insights.

Get Container Networking now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.