Chapter 1. Kubernetes Concepts

Kubernetes is an open source orchestration system for managing containerized applications across multiple hosts, providing basic mechanisms for the deployment, maintenance, and scaling of applications.

Kubernetes, or “k8s” or “kube” for short, allows the user to declaratively specify the desired state of a cluster using high-level primitives. For example, the user may specify that she wants three instances of the WildFly server container running. Kubernetes’ self-healing mechanisms, such as autorestarting, rescheduling, and replicating containers, then converge the actual state toward the desired state.

Kubernetes supports Docker and Rocket containers. An abstraction around the containerization layer allows for other container image formats and runtimes to be supported in the future. But while multiple container formats are supported, Docker is by far the most prevalent format used with Kubernetes.

All resource files in this chapter are available on GitHub.

Pods

A pod is the smallest deployable unit that can be created, scheduled, and managed. It’s a logical collection of containers that belong to an application. Pods are created in a namespace. All containers in a pod share the namespace, volumes, and networking stack. This allows containers in the pod to “find” each other and communicate using localhost.

Each resource in Kubernetes can be defined using a configuration file. For example, a WildFly pod can be defined with the configuration file shown in Example 1-1.

Example 1-1. Pod configuration
apiVersion: v1 1
kind: Pod 2
metadata: 3
  name: wildfly-pod 4
  labels: 5
    name: wildfly-pod
spec: 6
  containers: 7
  - name: wildfly 8
    image: jboss/wildfly:10.1.0.Final 8
    ports:
    - containerPort: 8080 9

This configuration file uses the following properties:

1

apiVersion defines the version of the Kubernetes API. This is now fixed at v1 and allows for the API to evolve in the future.

2

kind defines the type of this resource—in this example, that value is Pod.

3

metadata allows you to attach information about the resource.

4

Each resource must have a name attribute. If this attribute is not set, then you must specify the generateName attribute, which is then used as a prefix to generate a unique name. Optionally, you can use a namespace property to specify a namespace for the pod. Namespaces provide a scope for names and are explained further in “Namespaces”.

In addition to these properties, there are two types of metadata: metadata.labels and metadata.annotations. They both are defined as key/value pairs.

5

Labels are designed to specify identifying attributes of the object that are meaningful and relevant to the users, but which do not directly imply semantics to the core system. Multiple labels can be attached to a resource. For example, name: wildfly-pod is a label assigned to this pod. Labels can be used to organize and to select subsets of objects.

Annotations are defined using metadata.annotations[]. They are designed to be nonidentifying arbitrary information attached to the object. Some information that can be recorded here is build/release information, such as release IDs, Git branch, and PR numbers.

6

spec defines the specification of the resource, pod in our case.

7

containers defines all the containers within the pod.

8

Each container must have a uniquely identified name and image property. name defines the name of the container, and image defines the Docker image used for that container. Some other commonly used properties in this section are:

args

A command array containing arguments to the entry point

env

A list of environment variables in key:value format to set in the container

9

ports define the list of ports to expose from the container. WildFly runs on port 8080, and thus that port is listed here. This allows other resources in Kubernetes to access this container on this port.

In addition, restartPolicy can be used to define the restart policy of all containers within the pod. volumes[] can be used to list volumes that can be mounted by containers belonging to the pod.

Pods are generally not created directly, as they do not survive node or scheduling failures. They are mostly created using a replication controller or deployment.

More details about the pod configuration file are available at the Kubernetes website.

Replication Controllers

A replication controller (RC) ensures that a specified number of pod “replicas” are running at any one time. Unlike manually created pods, the pods maintained by a replication controller are automatically replaced if they fail, get deleted, or are terminated. A replication controller ensures the recreation of a pod when the worker node fails or reboots. It also allows for both upscaling and downscaling the number of replicas.

A replication controller creating two instances of a WildFly pod can be defined as shown in Example 1-2.

Example 1-2. Replication controller configuration
apiVersion: v1
kind: ReplicationController 1
metadata:
  name: wildfly-rc
spec:
  replicas: 2 2
  selector: 3
    app: wildfly-rc-pod
  template: 4
    metadata:
      labels:
        app: wildfly-rc-pod
    spec:
      containers:
      - name: wildfly
        image: jboss/wildfly:10.1.0.Final
        ports:
        - containerPort: 8080

The apiVersion, kind, metadata, and spec properties serve the same purpose in all configuration files.

This configuration file has the following additional properties:

1

The value of kind is ReplicationController, which indicates that this resource is a replication controller.

2

replicas defines the number of replicas of the pod that should concurrently run. By default, only one replica is created.

3

selector is an optional property. The replication controller manages the pods that contain the labels defined by the spec.selector property. If specified, this value must match spec.template.metadata.labels.

All labels specified in the selector must match the labels on the selected pod.

4

template is the only required field of spec in this case. The value of this field is exactly the same as a pod, except it is nested and does not have an apiVersion or kind. Note that spec.template.metadata.labels matches the value specified in spec.selector. This ensures that all pods started by this replication controller have the required metadata in order to be selected.

Each pod started by this replication controller has a name in the format <name-of-the-RC>-<hash-value-of-pod-template>. In our case, all names will be wildfly-rc-xxxxx, where xxxxx is the hash value of the pod template.

More details about replication controllers are available at the Kubernetes website.

Replica Sets

Replica sets are the next-generation replication controllers. Just like a replication controller, a replica set ensures that a specified number of pod replicas are running at any one time. The only difference between a replication controller and a replica set is the selector support.

For replication controllers, matching pods must satisfy all of the specified label constraints. The supported operators are =, ==, and !=. The first two operators are synonyms and represent equality. The last operator represents inequality.

For replica sets, filtering is done according to a set of values. The supported operators are in, notin, and exists (only for the key). For example, a replication controller can select pods such as environment = dev. A replica set can select pods such as environment in ["dev", "test"].

A replica set creating two instances of a WildFly pod can be defined as shown in Example 1-3.

Example 1-3. Replica set configuration
apiVersion: extensions/v1beta1 1
kind: ReplicaSet 2
metadata:
  name: wildfly-rs
spec:
  replicas: 2
  selector:
    matchLabels: 3
      app: wildfly-rs-pod 4
    matchExpressions: 5
      - {key: tier, operator: In, values: ["backend"]} 6
      - {key: environment, operator: NotIn, values: ["prod"]} 6
  template:
    metadata:
      labels:
        app: wildfly-rs-pod
        tier: backend
        environment: dev
    spec:
      containers:
      - name: wildfly
        image: jboss/wildfly:10.1.0.Final
        ports:
        - containerPort: 8080

The key differences between Examples 1-2 and 1-3 are as follows:

1

The apiVersion property value is extensions/v1beta1. This means that this object is not part of the “core” API at this time, but is only a part of the extensions group. Read more about API versioning at the Kubernetes GitHub page.

2

The value of kind is Replicaset and indicates the type of this resource.

3

matchLabels defines the list of labels that must be on the selected pod. Each label is a key/value pair.

4

wildfly-rs-pod is the exact label that must be on the selected pod.

5

matchExpressions defines the list of pod selector requirements.

6

Each expression can be defined as a combination of three key/value pairs. The keys are key, operator, and values. The values are one of the keys from the labels; one of the operators In, NotIn, Exist, or DoesNotExist; and a nonempty set of values, respectively.

All the requirements, from both matchLabels and matchExpressions, must match for the pod to be selected.

Replica sets are generally never created on their own. Deployments own and manage replica sets to orchestrate pod creation, deletion, and updates. See the following section for more details about deployments.

More details about replica sets are available at the Kubernetes website.

Deployments

Deployments provide declarative updates for pods and replica sets. You can easily achieve the following functionality using deployment:

  • Start a replication controller or replica set.

  • Check the status of deployment.

  • Update deployment to use a new image, without any outages.

  • Roll back deployment to an earlier revision.

A WildFly replica set with three replicas can be defined using the configuration file shown in Example 1-4.

Example 1-4. Deployment configuration
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: wildfly-deployment
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: wildfly
    spec:
      containers:
      - name: wildfly
        image: jboss/wildfly:10.1.0.Final
        ports:
        - containerPort: 8080

Two main differences from Example 1-2 are:

  • The apiVersion property value is extensions/v1beta1. This means that this object is not part of the “core” API at this time and is only a part of the extensions group. Read more about API versioning at the Kubernetes GitHub page.

  • The value of the kind property is Deployment, which indicates the type of resource.

More details about deployment are available in the Kubernetes user guide.

Services

A pod is ephemeral. Each pod is assigned a unique IP address. If a pod that belongs to a replication controller dies, then it is recreated and may be given a different IP address. Further, additional pods may be created using replication controllers. This makes it difficult for an application server such as WildFly to access a database such as Couchbase using its IP address.

A service is an abstraction that defines a logical set of pods and a policy by which to access them. The IP address assigned to a service does not change over time, and thus can be relied upon by other pods. Typically, the pods belonging to a service are defined by a label selector. This is similar to how pods belong to a replication controller.

This abstraction of selecting pods using labels enables a loose coupling. The number of pods in the replication controller may scale up or down, but the application server can continue to access the database using the service.

Multiple resources, such as a service and a replication controller, may be defined in the same configuration file. In this case, each resource definition in the configuration file needs to be separated by ---.

For example, a WildFly service and a replication controller that creates matching pods can be defined as shown in Example 1-5.

Example 1-5. Service configuration
apiVersion: v1
kind: Service 1
metadata:
  name: wildfly-service
spec:
  selector:
    app: wildfly-rc-pod 2
  ports:
    - name: web 3
      port: 8080
--- 4
apiVersion: v1
kind: ReplicationController 1
metadata:
  name: wildfly-rc
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: wildfly-rc-pod 2
    spec:
      containers:
      - name: wildfly
        image: jboss/wildfly:10.1.0.Final
        ports:
        - containerPort: 8080

Multiple resources are created in the order they are specified in the file.

In this configuration file:

1

There are two resources: a service and a replication controller.

2

The service selects any pods that contain the label app: wildfly-rc-pod. The replication controller attaches those labels to the pod.

3

port defines the port on which the service is accessible. A service can map an incoming port to any target port in the container using targetPort. By default, targetPort is the same as port.

A service may expose multiple ports. In this case, each port must be given a unique name:

ports:
  - name: web
    port: 8080
4

--- is the separator between multiple resources.

By default, a service is available only inside the cluster. It can be exposed outside the cluster, as covered in “Exposing a Service”.

More details about services are available at the Kubernetes website.

Jobs

A job creates one or more pods and ensures that a specified number of them successfully complete. When the specified number of pods has successfully completed, the job itself is complete. The job will start a new pod if the pod fails or is deleted due to hardware failure.

This is different from a replication controller or a deployment, which ensure that a certain number of pods are always running. If a pod in a replication controller or deployment terminates, it is restarted. This makes replication controllers and deployments both long-running processes, which is well suited for an application server such as WildFly. But a job is completed only when the specified number of pods successfully completes, which is well suited for tasks that need to run only once. For example, a job may convert one image format to another. Restarting this pod in a replication controller would not only cause redundant work but may even be harmful in certain cases.

There are two main types of jobs:

Nonparallel jobs

Job specification consists of a single pod. The job completes when the pod successfully terminates.

Parallel jobs

A predefined number of pods successfully completes. Alternatively, a work queue pattern can be implemented where pods can coordinate among themselves or with an external service to determine what each should work on.

A nonparallel job can be defined using the configuration file shown in Example 1-6.

Example 1-6. Job configuration
apiVersion: batch/v1 1
kind: Job 2
metadata:
  name: wait
spec:
  template:
    metadata:
      name: wait
    spec: 3
      containers:
      - name: wait
        image: ubuntu 4
        command: ["sleep",  "20"] 5
      restartPolicy: Never 6

In this configuration file:

1

Jobs are defined in their own API group using the path batch/v1.

2

The Job value defines this resource to be of the type job.

3

spec specifies the job resource as a pod template. This is similar to a replication controller.

4

This job uses the base image of ubuntu. Usually, this will be a custom image that will perform the run-once task.

5

By default, running the ubuntu image starts the shell. In this case, command overrides the default command and waits for 20 seconds. Note, this is only an example usage. The actual task would typically be done in the image itself.

6

Each pod template must explicitly specify the restartPolicy equal to Never or OnFailure. A value of Never means that the pod is marked Succeeded or Failed depending upon the number of containers running and how they exited. A value of OnFailure means the pod is restarted if the container in the pod exits with a failure. More details about these policies are available at the Kubernetes website.

Kubernetes 1.4 introduced a new alpha resource called ScheduledJob. This resource was renamed to CronJob starting in version 1.5.

CronJob allows you to manage time-based jobs. There are two primary use cases:

  • Run jobs once at a specified point in time.

  • Run jobs repeatedly at a specified point in time.

Note, this is an alpha resource, so it needs to be explicitly enabled.

Volumes

Pods are ephemeral and work well for a stateless container. They are restarted automatically when they die, but any data stored in their filesystem is lost with them. Stateful containers, such as Couchbase, require data to be persisted outside the lifetime of a container running inside a pod. This is where volumes help.

A volume is a directory that is accessible to the containers in a pod. The directory, the medium that backs it, and the contents within it are determined by the particular volume type used. A volume outlives any containers that run within the pod, and the data is preserved across container restarts.

Multiple types of volumes are supported. Some of the commonly used volume types are shown in Table 1-1.

Table 1-1. Common volume types in Kubernetes
Volume type Mounts into your pod

hostPath

A file or directory from the host node’s filesystem

nfs

Existing Network File System share

awsElasticBlockStore

An Amazon Web Service EBS volume

gcePersistentDisk

A Google Compute Engine persistent disk

Two properties need to be defined for a volume to be used inside a pod: spec.volumes to define the volume type, and spec.containers.volumeMounts to specify where to mount the volume. Multiple volumes in a pod and multiple mount points in a container can be easily defined. A process in a container sees a filesystem view composed of the Docker image and volumes in the pod.

A volume defined in the pod configuration file is shown in Example 1-7.

Example 1-7. Volume configuration
apiVersion: v1
kind: Pod
metadata:
  name: couchbase-pod
  labels:
    name: couchbase-pod
spec:
  containers:
  - name: couchbase
    image: arungupta/couchbase-oreilly:k8s 1
    ports:
    - containerPort: 8091
    volumeMounts: 2
    - mountPath: /var/couchbase/lib 3
      name: couchbase-data 4
  volumes: 5
  - name: couchbase-data 4
    hostPath: 6
      path: /opt/data 7

In this configuration file:

1

The pod in the replication controller uses the image at arungupta/oreilly-couchbase:k8s. This image is created using Couchbase. It uses the Couchbase REST API to configure the Couchbase server and create a sample bucket in it.

2

The volumeMounts property defines where the volume is mounted in the container.

3

mountPath defines the path where the volume is mounted in the container.

4

name refers to a named volume defined using volumes. This value must match the value of the name property of one of the volumes defined in volumes.

5

volumes defines the volumes accessible to the pod.

6

hostPath defines the type of volume mounted. This volume type is mounting a directory from host node’s filesystem. A differe volume type may be specified here.

7

/opt/data is the path in the host node filesystem.

You can create an Amazon Elastic Block Storage (EBS) volume using the aws ec2 create-volume command. Alternatively, you can create a Google Cloud persistent disk using the gcloud compute disks create command. You can mount these volumes in the container using the awsElasticBlockStore and gcePersistentDisk volume types, respectively.

More details about volumes, including different types of volumes and how to configure them, are available at the Kubernetes website.

Architecture

The key components of the Kubernetes architecture are shown in Figure 1-1.

kfjd 01in01
Figure 1-1. Kubernetes architecture

A Kubernetes cluster is a set of physical or virtual machines and other infrastructure resources that are used to run your applications. Each machine is called a node. The machines that manage the cluster are called master nodes, and the machines that run the containers are called worker nodes. Each node handles the necessary services to run application containers.

The two typical interaction points with Kubernetes are kubectl and the client application running in the internet.

Master nodes

A master node is a central control plane that provides a unified view of the cluster. You can easily create a Kubernetes cluster with a single master node for development. Alternatively, you could create a Kubernetes cluster with high availability with multiple master nodes. Let’s look at the key components in the master node:

kubectl

This is a command-line tool that send commands to the master node to create, read, update, and delete resources. For example, it can request to create a pod by passing the pod configuration file, or it can query more details about the replicas running for a replica set. It reads container manifests as YAML or JSON files that describe each resource. A typical way to provide this manifest is using the configuration file as shown in the previous sections. This process is explained more in “Running Your First Java Application”.

API server

Each command from kubectl is translated into a REST API and issued to the API server running inside the master node. The API server processes REST operations, validates them, and persists the state in a distributed watchable storage. This is implemented using etcd for Kubernetes.

Scheduler

The scheduler works with the API server to schedule pods to the nodes. The scheduler has information about resources available on the worker nodes, as well as the ones requested by the pods. It uses this information to decide which node will be selected to deploy a specific pod.

Controller manager

The controller manager is a daemon that watches the state of the cluster using the API server for different controllers and reconciles the actual state with the desired one (e.g., the number of pods to run for a replica set). Some other controllers that come with Kubernetes are the namespace controller and the horizontal pod autoscaler.

etcd

This is a simple, distributed, watchable, and consistent key/value store. It stores the persistent state of all REST API objects—for example, how many pods are deployed on each worker node, labels assigned to each pod (which can then be used to include the pods in a service), and namespaces for different resources. For reliability, etcd is typically run in a cluster.

Worker nodes

A worker node runs tasks as delegated by the master. Each worker node can run multiple pods:

Kubelet

This is a service running on each node that manages containers and is managed by the master. It receives REST API calls from the master and manages the resources on that node. Kubelet ensures that the containers defined in the API call are created and started.

Kubelet is a Kubernetes-internal concept and generally does not require direct manipulation.

Proxy

This runs on each node, acting as a network proxy and load balancer for a service on a worker node. Client requests coming through an external load balancer will be redirected to the containers running in a pod through this proxy.

Docker

Docker Engine is the container runtime running on each node. It understands the Docker image format and knows how to run Docker containers. Alternatively, Kubernetes may be configuired to use rkt as the container runtime. More details about that are available in the guide to running Kubernetes with rkt.

Get Kubernetes for Java Developers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.