Chapter 4. Orchestration
With the cattle approach to managing infrastructure, you donât manually allocate certain machines for running an application. Instead, you leave it up to an orchestrator to manage the life cycle of your containers. In Figure 4-1, you can see that container orchestration includes a range of functions, including but not limited to:
-
Organizational primitives, such as labels in Kubernetes, to query and group containers
-
Scheduling of containers to run on a host
-
Automated health checks to determine if a container is alive and ready to serve traffic and to relaunch it if necessary
-
Autoscaling (that is, increasing or decreasing the number of containers based on utilization or higher-level metrics)
-
Upgrade strategies, from rolling updates to more sophisticated techniques such as A/B and canary deployments
-
Service discovery to determine which host a scheduled container ended upon, usually including DNS support
Sometimes considered part of orchestration but outside the scope of this book is the topic of base provisioningâthat is, installing or upgrading the local operating system on a node or setting up the container runtime there.
Service discovery (covered in greater detail in Chapter 5) and scheduling are really two sides of the same coin. The scheduler decides where in a cluster a container is placed and supplies other parts with an up-to-date mapping in the form containers -> locations
. This mapping can then be represented in various ways, be it in a distributed key-value store such as etcd, via DNS, or through environment variables.
In this chapter we will discuss networking and service discovery from the point of view of the following container orchestration solutions: Docker Swarm and swarm mode, Apache Mesos, and HashiCorp Nomad. These three are (along with Kubernetes, which we will cover in detail in Chapter 7) alternatives your organization may already be using, and hence, for the sake of completeness, itâs worth exploring them here. To make it clear, though, as of early 2018 the industry has standardized on Kubernetes as the portable way of doing container orchestration.
Note
In addition to the three orchestrators discussed in this chapter, there are other (closed source) solutions out there you could have a look at, including Facebookâs Bistro or hosted solutions such as Amazon ECS.
Should you want to more fully explore the topic of distributed system scheduling, I suggest reading Googleâs research papers on Borg and Omega.
Before we dive into container orchestration systems, though, letâs step back and review what the schedulerâwhich is the core component of orchestrationâactually does in the context of containerized workloads.
What Does a Scheduler Actually Do?
A scheduler for a distributed system takes an applicationâbinary or container imageâas requested by a user and places it on one or more of the available hosts. For example, a user might request to have 100 instances of the app running, so the scheduler needs to find space (CPU, RAM) to run these 100 instances on the available hosts.
In the case of a containerized setup, this means that the respective container image must exist on a host (if not, it must be pulled from a container registry first), and the scheduler must instruct the container runtime on that host to launch a container based on the image.
Letâs look at a concrete example. In Figure 4-2, you can see that the user requested three instances of the app running in the cluster. The scheduler decides the placement based on its knowledge of the state of the cluster. The cluster state may include the utilization of the machines, the resources necessary to successfully launch the app, and constraints such as launch this app only on a machine that is SSD-backed.
Further, quality of service might be taken into account for the placement decision; see Michael Gaschâs great article âQoS, Node allocatable and the Kubernetes Schedulerâ for more details.
If you want to learn more about scheduling in distributed systems I suggest you check out the excellent resource âCluster Management at Googleâ by John Wilkes.
Warning
Beware of the semantics of constraints that you can place on scheduling containers. For example, I once gave a demo using Marathon that wouldnât work as planned because I screwed up the placement constraints: I used a combination of unique hostname and a certain role, and it wouldnât scale because there was only one node with the specified role in the cluster. The same thing can happen with Kubernetes labels.
Docker
Docker at the time of writing uses the so-called swarm mode in a distributed setting, whereas previous to Docker 1.12 the standalone Docker Swarm model was used. We will discuss both here.
Swarm Mode
Since Docker 1.12, swarm mode has been integrated with Docker Engine. The orchestration features embedded in Docker Engine are built using SwarmKit.
A swarm in Docker consists of multiple hosts running in swarm mode and acting as managers and workersâhosts can be managers, workers, or perform both roles at once. A task is a running container that is part of a swarm service and managed by a swarm manager, as opposed to a standalone container. A service in the context of Docker swarm mode is a definition of the tasks to execute on the manager or worker nodes. Docker works to maintain that desired state; for example, if a worker node becomes unavailable, Docker schedules the tasks onto another host.
Docker running in swarm mode doesnât prevent you from running standalone containers on any of the hosts participating in the swarm. The essential difference between standalone containers and swarm services is that only swarm managers can manage a swarm, while standalone containers can be started on any host.
To learn more about Dockerâs swarm mode, check out the official âGetting Started with Swarm Modeâ tutorial or check out the Katacoda âDocker Orchestration â Getting Started with Swarm Modeâ scenario.
Docker Swarm
Docker historically had a native clustering tool called Docker Swarm. Docker Swarm builds upon the Docker API1 and works as follows: thereâs one Swarm manager thatâs responsible for scheduling, and on each host an agent runs that takes care of the local resource management (Figure 4-3).
Docker Swarm supports different backends: etcd, Consul, and ZooKeeper. You can also use a static file to capture your cluster state with Swarm, and recently a DNS-based service discovery tool for Swarm called wagl has been introduced.
Note
Out of the box, Docker provides a basic service discovery mechanism for single-node deployments called Docker links. Linking allows a user to let any container discover both the IP address and exposed ports of other Docker containers on the same host. In order to accomplish this,
Docker provides the --link
flag. But hard-wiring of links between containers is neither fun nor scalable. In fact, itâs so bad that this feature has been deprecated.
Apache Mesos
Apache Mesos (Figure 4-4) is a general-purpose cluster resource manager that abstracts the resources of a cluster (CPU, RAM, etc.) in such a way that the cluster appears like one giant computer to the developer. In a sense, Mesos acts like the kernel of a distributed operating system. It is hence never used on its own, but always together with so-called frameworks such as Marathon (for long-running stuff like a web server) or Chronos (for batch jobs), or big data and fast data frameworks like Apache Spark or Apache Cassandra.
Mesos supports both containerized workloads (that is, running Docker containers) and plain executables (for example, bash scripts or Linux ELF format binaries for both stateless and stateful services).
In the following discussion, Iâm assuming youâre familiar with Mesos and its terminology. If youâre new to Mesos, I suggest checking out David Greenbergâs wonderful book Building Applications on Mesos (OâReilly), a gentle introduction to this topic thatâs particularly useful for distributed application developers.
The networking characteristics and capabilities mainly depend on the Mesos containerizer used:
-
For the Mesos containerizer there are a few prerequisites, such as having a Linux Kernel version > 3.16 and
libnl
installed. You can then build a Mesos agent with network isolator support enabled. At launch, you would use something like the following:$mesos-slave --containerizer=mesos --isolation=network/port_mapping --resources=ports:[31000-32000];ephemeral_ports:[33000-35000]
This would configure the Mesos agent to use nonephemeral ports in the range from 31,000 to 32,000 and ephemeral ports in the range from 33,000 to 35,000. All containers share the hostâs IP, and the port ranges are spread over the containers (with a 1:1 mapping between destination port and container ID). With the network isolator, you also can define performance limitations such as bandwidth, and it enables you to perform per-container monitoring of the network traffic. See Jie Yuâs MesosCon 2015 talk âPer Container Network Monitoring and Isolation in Mesosâ for more details on this topic.
-
For the Docker containerizer, see Chapter 2.
Note that Mesos supports IP-per-container since version 0.23. If you want to learn more about Mesos networking check out Christos Kozyrakis and Spike Curtisâs âMesos Networkingâ talk from MesosCon 2015.
While Mesos is not opinionated about service discovery, there is a Mesos-specific solution that is often used in practice: Mesos-DNS (see âPure-Play DNS-Based Solutionsâ). There are also a multitude of emerging solutions, such as traefik (see âWrapping It Upâ) that are integrated with Mesos and gaining traction.
Note
Because Mesos-DNS is the recommended default service discovery mechanism with Mesos, itâs important to pay attention to how Mesos-DNS represents tasks. For example, a running task might have the (logical) service name webserver.marathon.mesos
, and you can find out the port allocations via DNS SRV records.
If you want to try out Mesos online for free you can use the Katacoda âDeploying Containers to DC/OSâ scenario.
Hashicorp Nomad
Nomad is a cluster scheduler by HashiCorp, the makers of Vagrant. It was introduced in September 2015 and primarily aims at simplicity. The main idea is that Nomad is easy to install and use. Its scheduler design is reportedly inspired by Googleâs Omega, borrowing concepts such as having a global state of the cluster as well as employing an optimistic, concurrent scheduler.
Nomad has an agent-based architecture with a single binary that can take on different roles, supporting rolling upgrades as well as draining nodes for re-balancing. Nomad makes use of both a consensus protocol (strongly consistent) for all state replication and scheduling and a gossip protocol used to manage the addresses of servers for automatic clustering and multiregion federation. In Figure 4-5, you can see Nomadâs architecture:
-
Servers are responsible for accepting jobs from users, managing clients, and computing task placements.
-
Clients (one per VM instance) are responsible for interacting with the tasks or applications contained within a job. They work in a pull-based manner; that is, they register with the server and then they poll it periodically to watch for pending work.
Jobs in Nomad are defined in a HashiCorp-proprietary format called HCL or in JSON, and Nomad offers a command-line interface as well as an HTTP API to interact with the server process. Nomad models infrastructure as regions and datacenters. Regions may contain multiple datacenters, depending on what scale you are operating at. You can think of a datacenter like a zone in AWS, Azure, or Google Cloud (say, us-central1-b
), and a region might be something like Iowa (us-central1
).
Iâm assuming youâre familiar with Nomad and its terminology. If not, I suggest you watch âNomad: A Distributed, Optimistically Concurrent Schedule: Armon Dadgar, HashiCorpâ, a nice introduction to Nomad, and also read the well-done docs.
To try out Nomad, use the UI Demo HashiCorp provides or try it out online for free using the Katacoda âIntroduction to Nomadâ scenario.
Nomad comes with a couple of so-called task drivers, from general-purpose exec
to java
to qemu
and docker
. For the docker
driver Nomad requires, at the time of this writing, Docker version 1.10 or greater and uses port binding to expose services running in containers using the port space on the hostâs interface. It provides automatic and manual mapping schemes for Docker, binding both TCP and UDP protocols to ports used for Docker containers.
For more details on networking options, such as mapping ports and using labels, see the documentation.
With v0.2, Nomad introduced a Consul-based (see âConsulâ) service discovery mechanism. It includes health checks and assumes that tasks running inside Nomad also need to be able to connect to the Consul agent, which can, in the context of containers using bridge mode networking, pose a challenge.
Community Matters
An important aspect youâll want to consider when selecting an orchestration system is the community behind and around it.2 Here are a few indicators and metrics you can use:
-
Is the governance backed by a formal entity or process, such as the Apache Software Foundation (ASF) or the Linux Foundation (LF)?
-
How active are the mailing list, the IRC channel, the bug/issue tracker, the Git repo (number of patches or pull requests), and other community initiatives?
Take a holistic view, but make sure that you actually pay attention to the activity there. Healthy and hopefully growing communities will tend to have high participation in at least one of these areas.
-
Is the orchestration tool (implicitly) controlled by a single entity? For example, in the case of Nomad HashiCorp is in control, for Apache Mesos itâs mainly Mesosphere (and to some extent Twitter), etc.
-
Are multiple independent providers and support channels available? For example, you can run Kubernetes in many different environments and get help from many (commercial) organizations as well as individuals on Slack, mailing lists, or forums.
Wrapping It Up
As of early 2018, Kubernetes (discussed in Chapter 7) can be considered the de facto container orchestration standard. All major providers, including Docker and DC/OS (Mesos), support Kubernetes.
Next, weâll move on to service discovery, a vital part of container orchestration.
1 Essentially, this means that you can simply keep using docker run
commands and the deployment of your containers in a cluster happens automagically.
2 Now, you might argue that this is not specific to the container orchestration domain but a general OSS issue, and youâd be right. Still, I believe it is important enough to mention it, as many people are new to this area and can benefit from these insights.
Get Container Networking now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.