Chapter 4. Managed Lifecycle

Containerized applications managed by cloud-native platforms have no control over their lifecycle, and to be good cloud-native citizens, they have to listen to the events emitted by the managing platform and adapt their lifecycles accordingly. The Managed Lifecycle pattern describes how applications can and should react to these lifecycle events.

Problem

In Chapter 3, “Health Probe” we explained why containers have to provide APIs for the different health checks. Health-check APIs are read-only endpoints the platform is continually probing to get application insight. It is a mechanism for the platform to extract information from the application.

In addition to monitoring the state of a container, the platform sometimes may issue commands and expect the application to react on these. Driven by policies and external factors, a cloud-native platform may decide to start or stop the applications it is managing at any moment. It is up to the containerized application to determine which events are important to react to and how to react. But in effect, this is an API that the platform is using to communicate and send commands to the application. Also, applications are free to either benefit from lifecycle management or ignore it if they don’t need this service.

Solution

We saw that checking only the process status is not a good enough indication of the health of an application. That is why there are different APIs for monitoring the health of a container. Similarly, using only the process model to run and stop a process is not good enough. Real-world applications require more fine-grained interactions and lifecycle management capabilities. Some applications need help to warm up, and some applications need a gentle and clean shutdown procedure. For this and other use cases, some events, as shown in Figure 4-1, are emitted by the platform that the container can listen to and react to if desired.

Managed container lifecycle
Figure 4-1. Managed container lifecycle

The deployment unit of an application is a Pod. As you already know, a Pod is composed of one or more containers. At the Pod level, there are other constructs such as init containers, which we cover in Chapter 7, “Init Container” (and defer-containers, which is still at the proposal stage as of this writing) that can help manage the container lifecycle. The events and hooks we describe in this chapter are all applied at an individual container level rather than Pod level.

SIGTERM Signal

Whenever Kubernetes decides to shut down a container, whether that is because the Pod it belongs to is shutting down or simply a failed liveness probe causes the container to be restarted, the container receives a SIGTERM signal. SIGTERM is a gentle poke for the container to shut down cleanly before Kubernetes sends a more abrupt SIGKILL signal. Once a SIGTERM signal has been received, the application should shut down as quickly as possible. For some applications, this might be a quick termination, and some other applications may have to complete their in-flight requests, release open connections, and clean up temp files, which can take a slightly longer time. In all cases, reacting to SIGTERM is the right moment to shut down a container in a clean way.

SIGKILL Signal

If a container process has not shut down after a SIGTERM signal, it is shut down forcefully by the following SIGKILL signal. Kubernetes does not send the SIGKILL signal immediately but waits for a grace period of 30 seconds by default after it has issued a SIGTERM signal. This grace period can be defined per Pod using the .spec.terminationGracePeriodSeconds field, but cannot be guaranteed as it can be overridden while issuing commands to Kubernetes. The aim here should be to design and implement containerized applications to be ephemeral with quick startup and shutdown processes.

Poststart Hook

Using only process signals for managing lifecycles is somewhat limited. That is why there are additional lifecycle hooks such as postStart and preStop provided by Kubernetes. A Pod manifest containing a postStart hook looks like the one in Example 4-1.

Example 4-1. A container with poststart hook
apiVersion: v1
kind: Pod
metadata:
  name: post-start-hook
spec:
  containers:
  - image: k8spatterns/random-generator:1.0
    name: random-generator
    lifecycle:
      postStart:
        exec:
          command:      1
          - sh
          - -c
          - sleep 30 && echo "Wake up!" > /tmp/postStart_done
1

The postStart command waits here 30 seconds. sleep is just a simulation for any lengthy startup code that might run at this point. Also, it uses a trigger file to sync with the main application, which starts in parallel.

The postStart command is executed after a container is created, asynchronously with the primary container’s process. Even if many of the application initialization and warm-up logic can be implemented as part of the container startup steps, postStart still covers some use cases. The postStart action is a blocking call, and the container status remains Waiting until the postStart handler completes, which in turn keeps the Pod status in the Pending state. This nature of postStart can be used to delay the startup state of the container while giving time to the main container process to initialize.

Another use of postStart is to prevent a container from starting when the Pod does not fulfill certain preconditions. For example, when the postStart hook indicates an error by returning a nonzero exit code, the main container process gets killed by Kubernetes.

postStart and preStop hook invocation mechanisms are similar to the Health Probes described in Chapter 3 and support these handler types:

exec

Runs a command directly in the container

httpGet

Executes an HTTP GET request against a port opened by one Pod container

You have to be very careful what critical logic you execute in the postStart hook as there are no guarantees for its execution. Since the hook is running in parallel with the container process, it is possible that the hook may be executed before the container has started. Also, the hook is intended to have at-least once semantics, so the implementation has to take care of duplicate executions. Another aspect to keep in mind is that the platform does not perform any retry attempts on failed HTTP requests that didn’t reach the handler.

Prestop Hook

The preStop hook is a blocking call sent to a container before it is terminated. It has the same semantics as the SIGTERM signal and should be used to initiate a graceful shutdown of the container when reacting to SIGTERM is not possible. The preStop action in Example 4-2 must complete before the call to delete the container is sent to the container runtime, which triggers the SIGTERM notification.

Example 4-2. A container with a preStop hook
apiVersion: v1
kind: Pod
metadata:
  name: pre-stop-hook
spec:
  containers:
  - image: k8spatterns/random-generator:1.0
    name: random-generator
    lifecycle:
      preStop:
        httpGet:           1
          port: 8080
          path: /shutdown
1

Call out to a /shutdown endpoint running within the application

Even though preStop is blocking, holding on it or returning a nonsuccessful result does not prevent the container from being deleted and the process killed. preStop is only a convenient alternative to a SIGTERM signal for graceful application shutdown and nothing more. It also offers the same handler types and guarantees as the postStart hook we covered previously.

Other Lifecycle Controls

In this chapter, so far we have focused on the hooks that allow executing commands when a container lifecycle event occurs. But another mechanism that is not at the container level but at a Pod level allows executing initialization instructions.

We describe init containers in Chapter 7, “Init Container”, in depth, but here we describe it briefly to compare it with lifecycle hooks. Unlike regular application containers, init containers run sequentially, run until completion, and run before any of the application containers in a Pod start up. These guarantees allow using init containers for Pod-level initialization tasks. Both lifecycle hooks and init containers operate at a different granularity (at container level and Pod-level, respectively) and could be used interchangeably in some instances, or complement each other in other cases. Table 4-1 summarizes the main differences between the two.

Table 4-1. Lifecycle Hooks and Init Containers
Aspect Lifecycle Hooks Init Containers

Activates on

Container lifecycle phases

Pod lifecycle phases

Startup phase action

A postStart command

A list of initContainers to execute

Shutdown phase action

A preStop command

No equivalent feature

Timing guarantees

A postStart command is executed at the same time as the container’s ENTRYPOINT

All init containers must be completed successfully before any application container can start

Use cases

Perform noncritical startup/shutdown cleanups specific to a container

Perform workflow-like sequential operations using containers; reuse containers for task executions

If even more control is required to manage the lifecycle of your application containers, there is an advanced technique for rewriting the container entrypoints, sometimes also referred to as the Commandlet Pattern (https://youtu.be/iPRw_n_JV4o). This pattern is especially useful for situations when the main containers within a Pod has to be started in a certain order and need some extra level of control. Kubernetes-based pipeline platforms like Tekton and Argo CD require a sequential ordering of container execution within a Pod that share data and allow for additional sidecar containers (we talk more about sidecars in Chapter 15).

For these scenarios a sequence of init container is not good enough because init containers don’t allow sidecars. As an alternative, an advanced technique called entrypoint rewriting can be used to allow fine grained lifecycle control for the Pod’s main containers. Every container image defines a command that is executed by default when the container starts. In a Pod specification you can also define this command directly in the Pod spec. The idea of entrypoint rewriting is to replace this command with a generic wrapper command that calls the original command and takes care of lifecycle concerns. This generic command is injected from another container image before the application container starts.

This concept is best explained by an example. Example 4-3 shows a typical Pod declaration that starts a single container with the given arguments.

Example 4-3. Simple pod starting an image with a command and arguments
apiVersion: v1
kind: Pod
metadata:
  name: simple-random-generator
spec:
  containers:
  - image: k8spatterns/random-generator:1.0
    name: random-generator
    command:
    - "random-generator-runner"   1
    args:                         2
    - "--seed"
    - "42"
1

The command executed when the container starts

2

Additional arguments provided to the entrypoint command

The trick is now to wrap the given command random-generator-runner with a generic supervisor program, that takes care of lifecycle aspects, like reacting on SIGTERM or other external signals.

Example 4-4. Pod that wraps the original entrypoint with a supervisor
apiVersion: v1
kind: Pod
metadata:
  name: wrapped-random-generator
spec:
  volumes:
  - name: wrapper                    1
    emptyDir: [ ]
  initContainers:
  - name: copy-supervisor            2
    image: k8spatterns/supervisor
    volumeMounts:
    - mountPath: /var/run/wrapper
      name: wrapper
    command: [ cp ]
    args: [ supervisor, /var/run/wrapper/supervisor ]
  containers:
  - image: k8spatterns/random-generator:1.0
    name: random-generator
    volumeMounts:
    - mountPath: /var/run/wrapper
      name: wrapper
    command:
    - /var/run/wrapper/supervisor  3
    args:                            4
    - random-generator-runner
    - --seed
    - 42
1

A fresh emptyDir volume is created to share the supervisor daemon

2

Init container used for copying the supervisor daemon to the application containers

3

The original command randomGenerator as defined in Example 4-3 is replaced with supervisor daemon from the shared volume.

4

The original command specification becomes the arguments for the supervisor commands

This entrypoint rewriting is especially useful for Kubernetes-based applications that create and manage Pods programmatically, like Tekton which creates Pods when running a CI pipeline. That way they gain much better control of when to start, stop or chain containers within a Pod.

There are no strict rules about which mechanism to use except when you require a specific timing guarantee. We could skip lifecycle hooks and init containers entirely and use a bash script to perform specific actions as part of a container’s startup or shutdown commands. That is possible, but it would tightly couple the container with the script and turn it into a maintenance nightmare. We could also use Kubernetes lifecycle hooks to perform some actions as described in this chapter. Alternatively, we could go even further and run containers that perform individual actions using init containers or inject supervisor daemons for even more sophisticated control. In this sequence, the options require more effort increasingly, but at the same time offer stronger guarantees and enable reuse.

Understanding the stages and available hooks of containers and Pod lifecycles is crucial for creating applications that benefit from being managed by Kubernetes.

Discussion

One of the main benefits the cloud-native platform provides is the ability to run and scale applications reliably and predictably on top of potentially unreliable cloud infrastructure. These platforms provide a set of constraints and contracts for an application running on them. It is in the interest of the application to honor these contracts to benefit from all of the capabilities offered by the cloud-native platform. Handling and reacting to these events ensures your application can gracefully start up and shut down with minimal impact on the consuming services. At the moment, in its basic form, that means the containers should behave as any well-designed POSIX process. In the future, there might be even more events giving hints to the application when it is about to be scaled up, or asked to release resources to prevent being shut down. It is essential to get into the mindset where the application lifecycle is no longer in the control of a person but fully automated by the platform.

Besides managing the application lifecycle, the other big duty of orchestration platforms like Kubernetes is to distribute containers over a fleet of nodes. The next pattern, Automated Placement, explains the options to influence the scheduling decisions from the outside.

Get Kubernetes Patterns, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.