Chapter 1. Security and Observability Strategy

In this chapter, we will cover a high-level overview of how you can build a security and observability strategy for your Kubernetes implementation. Subsequent chapters will cover each of these concepts in more detail. You need to think about a security strategy when you are in the pilot/pre-production phase of your Kubernetes journey, so if you are part of the security team, this chapter is very important. If you are part of the network, platform, or application team, this chapter shows how you can be a part of the security strategy and discuss the importance of collaboration between the security, platform, and application teams.

We will cover the following concepts that will guide you with your security and observability strategy:

  • How securing Kubernetes is different from traditional security methods

  • The life cycle of deploying applications (workloads) in a Kubernetes cluster and best practices for each stage

  • How you should implement observability to help with security

  • Well-known security frameworks and how to use them in your security strategy

Security for Kubernetes: A New and Different World

In this section we’ll highlight how Kubernetes is different and why traditional security methods do not work in a Kubernetes implementation.

As workloads move to the cloud, Kubernetes is the most common orchestrator for managing them. The reason Kubernetes is popular is its declarative nature: It abstracts infrastructure details and allows users to specify the workloads they want to run and the desired outcomes. The application team does not need to worry about how workloads are deployed, where workloads are run, or other details like networking; they just need to set up configurations in Kubernetes to deploy their applications.

Kubernetes achieves this abstraction by managing workload creation, shutdown, and restart. In a typical implementation, a workload can be scheduled on any available resource in a network (physical host or virtual machine) based on the workload’s requirements. A group of resources that a workload runs on is known as a Kubernetes cluster. Kubernetes monitors the status of workloads (which are deployed as pods in Kubernetes) and takes corrective action as needed (e.g., restarting unresponsive nodes). It also manages all networking necessary for pods and hosts to communicate with each other. You have the option to decide on the networking technology by selecting from a set of supported network plug-ins. While there are some configuration options for the network plug-in, you will not be able to directly control networking behavior (either for IP address assignment or in typical configurations where the node is scheduled).

Kubernetes is a different world for security teams. Their traditional method would be to build a “network of machines” and then onboard workloads (applications). As a part of onboarding, the process was to assign IPs, update networking as needed, and define and implement network access control rules. After these steps, the application was ready for users. This process ensured that security teams had a lot of control and could onboard and secure applications with ease. The applications were easy to secure, as applications were static in terms of assigned IPs, where they were deployed, etc.

In the Kubernetes world, workloads are built as container images and are deployed in a Kubernetes cluster using a configuration file (yaml). This is typically integrated in the development process, and most development teams use continuous integration (CI) and continuous delivery (CD) to ensure speedy and reliable delivery of software. What this means is that the security team has limited visibility into the impact of each application change on the security of the cluster. Adding a security-review step to this process is counterproductive, as the only logical place to add that is when the code is being committed. The development process after that point is automated, and disrupting it would conflict with the CI/CD model. So how can you secure workloads in this environment?

In order to understand how to secure workloads in Kubernetes, it is important to understand the various stages that are part of deploying a workload.

Deploying a Workload in Kubernetes: Security at Each Stage

In the previous section, we described the challenge of securing applications that are deployed using the CI/CD pipeline. This section describes the life cycle of workload deployment in a Kubernetes cluster and explains how to secure each stage. The three stages of workload deployment are the build, deploy, and runtime stages. Unlike traditional client-server applications where an application existed on a server (or a cluster of servers), applications in a Kubernetes deployment are distributed, and the Kubernetes cluster network is used by applications as a part of normal operation. Here are a few things to consider because of this configuration:

  • You need to consider security best practices as workloads and infrastructure are built. This is important due to the fact that applications in Kubernetes are deployed using the CI/CD pipeline.

  • You need to consider security best practices when a Kubernetes cluster is deployed and applications are onboarded.

  • Finally, applications use the infrastructure and the Kubernetes cluster network for normal operation, and you need to consider security best practices for application runtime.

Figure 1-1 illustrates the various stages and aspects to consider when securing workloads in a Kubernetes environment.

Figure 1-1. Workload deployment stages and security at each stage

The boxes below each stage describe various aspects of security that you need to consider for that stage:

  • The build stage is where you create (build) software for your workload (application) and build the infrastructure components (host or virtual machines) to host applications. This stage is part of the development cycle, and in most cases the development team is responsible for it. In this stage you consider security for the CI/CD pipeline, implement security for image repositories, scan images for vulnerabilities, and harden the host operating system. You need to ensure that you implement best practices to secure the image registry and avoid compromising the images in the image registry. This is generally implemented by securing access to the image registry, though a lot of users have private registries and do not allow images from public registries. Finally, you need to consider best practices for secrets management; secrets are like passwords that allow access to resources in your cluster. We will cover these topics in detail in Chapter 3. We recommend that when you consider security for this stage, you should collaborate with the security team so that security at this stage is aligned with your overall security strategy.

  • The next stage, deploy, is where you set up the platform that runs your Kubernetes deployment and deploy workloads. In this stage you need to think about the security best practices for configuring your Kubernetes cluster and providing external access to applications running inside your Kubernetes cluster. You also need to consider security controls like policies to limit access to workloads (pod security policies), network policies to control applications’ access to the platform components, and role-based access control (RBAC) for access to resources (for example, service creation, namespace creation, and adding/changing labels to pods). In most enterprises the platform team is responsible for this stage. As a member of the platform team, you need to collaborate with both the development and the security teams to implement your security strategy.

  • The final stage is the runtime stage, where you have deployed your application and it is operational. In this stage you need to think about network security, which involves controls using network policy, threat defense (using techniques to detect and prevent malicious activity in the cluster), and enterprise security controls like compliance, auditing, and encryption. The security team is responsible for this stage of the deployment. As a member of the security team, you need to collaborate with the platform and development teams as you design and implement runtime security. Collaboration between teams (development, platform, and security) is very important for building an effective security strategy. We recommend that you ensure all these teams are aligned.

Note that unlike with traditional security strategies, where security is enforced at a vantage point (like the perimeter), in the case of a Kubernetes cluster, you need to implement security at each stage. In addition, all teams involved (application, platform, and security) play a very important role in implementing security, so the key to implementing a successful strategy is collaboration between teams. Remember, security is a shared responsibility. Let’s explore each stage and the techniques you can use to build your strategy.

Build-Time Security: Shift Left

This section will guide you through various aspects of build-time security with examples.

Image scanning

During this stage, you need to ensure that applications do not have any major unpatched issues that are disclosed as common vulnerability enumerations (CVEs) in the National Vulnerability Database, and that the application code and dependencies are scanned for exploits and vulnerable code segments. The images that are built and delivered as containers are then scanned for unpatched critical or major vulnerabilities disclosed as CVEs. This is usually done by checking the base image and all its packages against a database that tracks vulnerable packages. In order to implement scanning, there are several tools, both open source and commercial, that are available to you. For example, Whitesource, Snyk, Trivy, Anchor, and even cloud providers like Google offer scanning of container images. We recommend that you select a scanning solution that understands how containers are built and scans not only the operating system on the host but also base images for containers. Given the dynamic nature of Kubernetes deployments, it is very important for you to secure the CI/CD pipeline; code and image scanning needs to be a part of the pipeline, and images being delivered from the image registry must be checked for compromise. You need to ensure access to the registry is controlled to avoid compromise. The popular term to describe this stage is shifting security left toward the development team, also known as shift-left security.

Host operating system hardening

Here you must ensure that the application being deployed is restricted to having the required privileges on the host where it is deployed. To achieve this, you should use a hardened host operating system that supports controls to enable restricting applications to only necessary privileges like system calls and file system access. This allows you to effectively mitigate attacks related to privilege escalation, where a vulnerability in the software being deployed in a container is used to gain access to the host operating system.

Minimizing the attack surface: Base container images

We recommend you review the composition of the container image and minimize software packages that make up the base image to include only packages that are absolutely necessary for your application to run. In Dockerfile-based container images, you can start with a parent image and then add your application to the image to create a container image. For example, you could start by building a base image in Docker using the FROM scratch directive, which will create a minimal image. You can then add your application and required packages, which will give you complete control of the composition of your container images and also help with CVE management, as you do not need to worry about patching CVEs in packages in a container image that aren’t required by your application. In case building a scratch image is not a viable option, you can consider starting with a distroless image (a slimmed-down Linux distribution image) or an Alpine minimal image as the base images for your container.

These techniques will help you design and implement your build-time security strategy. As a part of the development team, you will be responsible for designing and implementing build-time security in collaboration with the platform and security teams to ensure it is aligned with the overall security strategy. We caution against believing the myth that shift-left security can be your whole security strategy. It is incorrect, and a naive approach to securing workloads. There are several other important aspects, such as deploy and runtime security, that need to be considered as part of your security strategy as well.

Deploy-Time Security

The next stage in securing workloads is to secure the deployment. To accomplish this, you have to harden your Kubernetes cluster where the workloads are deployed. You will need a detailed review of the Kubernetes cluster configuration to ensure that it is aligned with security best practices. Start by building a trust model for various components of your cluster. A trust model is a framework where you review a threat profile and define mechanisms to respond to it. You should leverage tools like role-based access control (RBAC), label taxonomies, label governance, and admission controls to design and implement the trust model. These are mechanisms to control access to resources and controls and validation applied at resource creation time. These topics are covered in detail in Chapters 3, 4, and 7. The other critical components in your cluster are the Kubernetes datastore and Kubernetes API server, and you need to pay close attention to details like access control and data security when you design the trust model for these components. We recommend you use strong credentials, public key infrastructure (PKI) for access, and transport layer security (TLS) for data in transit encryption. Securing the Kubernetes APT and the datastore is covered in detail in Chapter 2.

You should think of the Kubernetes cluster where mission-critical workloads are deployed as an entity and then design a trust model for the entity. This requires you to review security controls at the perimeter, which will be challenging due to the Kubernetes deployment architectures; we will cover this in the next section. For now, let’s assume the current products that are deployed at the perimeter, like web access control gateways and next-generation firewalls, are not aware of Kubernetes architecture. We recommend you tackle this by building integrations with these devices, which will make them aware of the Kubernetes cluster context so they can be effective in applying security controls at the perimeter. This way you can create a very effective security strategy where the perimeter security devices work in conjunction with security implemented inside your Kubernetes cluster. As an example, say you need to make these devices aware of the identity of your workloads (IP address, TCP/UDP port, etc.). These devices can effectively protect the hosts that make up your Kubernetes cluster, but in most cases they cannot distinguish between workloads running on a single host. If you’re running in a cloud provider environment, you can use security groups, which are virtual firewalls that allow access control to a group of nodes (such as EC2 instances in Amazon Web Services) that host workloads. Security groups are more aligned with the Kubernetes architecture than traditional firewalls and security gateways; however, even security groups are not aware of the context for workloads running inside the cluster.

To summarize, when you consider deploy-time security, you need to implement a trust model for your Kubernetes cluster and build an effective integration with perimeter security devices that protect your cluster.

Runtime Security

Now that you have a strategy in place to secure the build and deploy stages, you need to think about runtime security. The term runtime security is used for various aspects of securing a Kubernetes cluster, for example on a host running software, but any configuration that protects the host and workloads from unauthorized activity (e.g., system calls, file access) is also called runtime security. Chapter 4 will cover host and workload runtime security in detail. In this section we will focus on the security best practices needed to ensure the secure operation of the Kubernetes cluster network. Kubernetes is an orchestrator that deploys workloads and applications across a network of hosts. You must consider network security as a very important aspect of runtime security.

Kubernetes promises increased agility and the more efficient use of compute resources, compared with the static partitioning and provisioning of servers or VMs. It does this by dynamically scheduling workloads across the cluster, taking into account the resource usage on each node, and connecting workloads on a flat network. By default, when a new workload is deployed, the corresponding pod could be scheduled on any node in the cluster, with any IP address within the pod IP address. If the pod is later rescheduled elsewhere, then it will normally get a different IP address. This means that pod IP addresses need to be treated as ephemeral. There is no long-term or special meaning associated with pod IP addresses or their location within the network.

Now consider traditional approaches to network security. Historically, in enterprise networks, network security was implemented using security appliances (or virtual versions of appliances) such as firewalls and routers. The rules enforced by these appliances were often based on a combination of the physical topology of the network and the allocation of specific IP address ranges to different classes of workloads.

As Kubernetes is based on a flat network, without any special meaning for pod IP addresses, very few of these traditional appliances are able to provide any meaningful workload-aware network security and instead have to treat the whole cluster as a single entity. In addition, in the case of east-west traffic between two pods hosted on the same node, the traffic does not even go via the underlying network. So these appliances won’t see this traffic at all and are essentially limited to north-south security, which secures traffic entering the cluster from external sources and traffic originating inside the cluster headed to sources outside the cluster.

Given all of this, it should be clear that Kubernetes requires a new approach to network security. This new approach needs to cover a broad range of considerations, including:

  • New ways to enforce network security (which workloads are allowed to talk to which other workloads) that do not rely on special meanings of IP addresses or network topology and that work even if the traffic does not traverse the underlying network; the Kubernetes network policy is designed to meet these needs.

  • New tools to help manage network policies that support new development processes and the desire for microservices to bring increased organizational agility, such as policy recommendations, policy impact previews, and policy staging.

  • New ways to monitor and visualize network traffic, covering both cluster-scoped holistic views (e.g., how to easily view the overall network and the cluster’s network security status) and targeted topographic views to drill down across a sequence of microservices to help troubleshoot or diagnose application issues.

  • New ways of implementing intrusion detection and threat defense, including policy violation alerting, network anomaly detection, and integrated threat feeds.

  • New remediation workflows, so potentially compromised workloads can be quickly and safely isolated during forensic investigation.

  • New mechanisms for auditing configuration and policy changes for compliance.

  • New mechanisms for auditing configuration and policy changes, and also Kubernetes-aware network flow logs to meet compliance requirements (since traditional network flow logs are IP-based and have little long-term meaning in the context of Kubernetes).

We will review an example of a typical Kubernetes deployment in an enterprise to understand these challenges. Figure 1-2 is a representation of a common deployment model for Kubernetes and microservices in a multicloud environment. A multicloud environment is one where an enterprise deploys Kubernetes in more than one cloud provider (Amazon Web services, Google Cloud, etc.). A hybrid cloud environment is one where an enterprise has a Kubernetes deployment in at least one cloud provider environment and a Kubernetes deployment on-premise in its datacenter. Most enterprises have a dual cloud strategy and will have clusters running in Amazon Web Services (AWS), Microsoft Azure, or Google Cloud; more enterprises also have some legacy applications running in their datacenters. Workloads in the datacenter will likely be behind a security gateway that filters traffic coming in through the perimeter. Microservices running in these Kubernetes deployments are also likely to have one or more dependencies on:

  • Other cloud services like AWS RDS or Azure DB

  • Third-party API endpoints like Twilio

  • SaaS services like Salesforce or Zuora

  • Databases or legacy apps running inside the datacenter

Workloads in the datacenter will likely be behind a security gateway that filters traffic coming in through the perimeter.

Observability in Kubernetes is the ability to derive actionable insights about the state of Kubernetes from metrics collected (more on this later). While observability has other applications, like monitoring and troubleshooting, it is important in the context of network security too. Observability concepts applied to flow logs correlated with other Kubernetes metadata (pods labels, policies, namespaces, etc.) are used to monitor (and then secure) communications between pods in a Kubernetes cluster, detect malicious activity by comparing IP addresses with known malicious IP addresses, and use machine learning–based techniques to detect malicious activity. These topics are covered in the next section. As you can see in Figure 1-2, the Kubernetes deployment poses challenges due to silos of data in each cluster and the potential loss of visibility from associating a workload in one cluster to a workload in another cluster or to an external service.

Figure 1-2. Example of a Kubernetes deployment in an enterprise

As shown in Figure 1-2, the footprint of a microservices application typically extends beyond the virtual private cloud (VPC) boundaries, and securing these applications requires a different approach from the traditional perimeter security approach. It is a combination of network security controls, observability, threat defense, and enterprise security controls. We will cover each of these next.

Network security controls

Native security controls available from cloud providers (for example, AWS Security Groups or Azure Network Security Groups) or security gateways (for example, next-generation firewalls) on the perimeter of the VPC or datacenter do not understand the identity of a microservice inside a Kubernetes cluster. For example, you cannot filter traffic to or from a Kubernetes pod or service with your security group rules or firewall policies. Additionally, by the time traffic from a pod hits a cloud provider’s network or a third-party firewall, the traffic (depending on the cloud provider’s architecture) has a source network address translation (SNAT) applied to it. In other words, the source IP address of traffic from all workloads on the node is set to the node IP, so any kind of allow/deny policies, at best, will have node-level (the node’s IP address) granularity.

Kubernetes workloads are highly dynamic and ephemeral. Let’s say a developer commits a new check-in for a particular workload. The automated CI/CD workflow will kick in, build a new version of the pod (container), and start deploying this new version of the workload in Kubernetes clusters. Kubernetes orchestrator will do a rolling upgrade and deploy new instances of the workload. All of this happens in an automated fashion, and there is no room for manual or out-of-band workflows to reconfigure the security controls for the newly deployed workload.

You need a new security architecture to secure workloads running in a multi- or hybrid cloud infrastructure. Just like your workload deployment in a Kubernetes cluster, the security of the workload has to be defined as code, in a declarative model. Security controls have to be portable across Kubernetes distributions, clouds, infrastructures, and/or networks. These security controls have to travel with the workloads, so if a new version of the workload is deployed in a VPC for Amazon Elastic Kubernetes Service (EKS), instead of on-premise clusters, you can be assured that the security controls associated with the service will be seamlessly enforced without you having to rework any network topology, out-of-band configuration of security groups, or VPC/perimeter firewalls.

Network security controls are implemented by using a network policy solution that is native to Kubernetes and provides granular access controls. There are several well-known implementations of network policy (such as Calico, Weave Net, Kube-router, Antrea) that you can use. In addition to applying policy at Layer 3/Layer 4 (TCP/IP), we recommend you look at solutions that support application layer policy (such as HTTP/HTTPS). We also recommend picking a solution that is based on the popular proxy Envoy, as it is widely deployed for application-layer policy. Kubernetes supports deploying applications as microservices (small components serving a part of the application functionality) over a network of nodes. The communication between microservices relies on application protocols such as HTTP. Therefore, there is a need for granular application controls that can be implemented by application layer policy. For example, in a three-tier application, the frontend microservice may only be allowed to use HTTP GET-based requests with the backend database microservice (read access) and not allowed to use HTTP POST with the backend database microservice (write access). All these requests can end up using the same TCP connection, so it is essential to add a policy engine that supports application-level controls as described here.

Enterprise security controls

Now that you have the strategy for network access controls and observability defined, you should consider additional security controls that are important and prevalent in enterprises. Encryption of data in transit is a critical requirement for security and compliance. There are several options to consider for encryption using traditional approaches, like TLS-based encryption in your workloads; mutual TLS, which is part of a service mesh platform; or a VPN-based approach like Wireguard (which offers a crypto key–based VPN).

We recommend that you leverage the data collection that is part of your observability strategy to build the reports needed to help with compliance requirements for standards like PCI, HIPAA, GDPR, and SOC 2. You should also consider the ability to ensure continuous compliance, and you can leverage the declarative nature of Kubernetes to help with the design and implementation of continuous compliance. For example, you can respond to a pod failing a compliance check by using the pod’s compliance status to trigger necessary action to correct the situation (trigger an image update).

Threat defense

Threat defense in a Kubernetes cluster is the ability to look at malicious activity in the cluster and then defend the cluster from it. Malicious activity allows an adversary to gain unauthorized access and manipulate or steal data from a Kubernetes cluster. The malicious activity can occur in many forms, such as exploiting an insecure configuration or exploiting a vulnerability in the application traffic or the application code.

When you build your threat defense strategy, you must consider both intrusion detection and prevention. The key to intrusion detection is observability; you need to review data collected to scan for known threats. In a Kubernetes deployment, data collection is very challenging due to the large amount of data you need to inspect. We have often heard this question: “Do I need a Kubernetes cluster to collect data to defend a Kubernetes cluster?” The answer is “no.” We recommend you align your observability strategy with intrusion detection and leverage smart aggregation to collect and inspect data. For example, you can consider using a tool that aggregates data as groups of “similar” pods talking to each other on a given destination port and protocol, instead of using the traditional method of aggregating by the five-tuple (source IP, source port, destination IP, destination port, protocol). This approach will help significantly reduce data collected without sacrificing effectiveness. Remember, several pods running the same container image and deployed in the same way will generate identical network traffic for a transaction. You may ask, “What if only one instance is infected? How can I detect that?” That’s a good question. There are a few ways. You could pick a tool that supports machine learning based on various metrics collected like connections, bytes, and packets to detect anomalous workloads. Another approach is to have a tool that can detect and match known malicious IPs and domains from well-known threat feeds as a part of collection, or log unaggregated network flows for traffic denied by policy. These are simple techniques that will help you build a strategy. Note that threat defense techniques evolve, and you will need a security research team to work with you to help understand your application and build a threat model to implement your threat defense strategy.


Observability is very useful for monitoring and securing a distributed system like Kubernetes. Kubernetes abstracts a lot of details, and in order to monitor a system like it, you cannot collect and independently baseline and monitor individual metrics (such as a single network flow, a pod create/destroy event, or a CPU spike on one node). What is needed is a way to monitor these metrics in the context of the Kubernetes. For example, a pod associated with a service or a deployment is restarted and running as a different binary as compared to its peers, or a pod activity (network, filesystem, kernel system calls) is different from other pods in the deployment. This becomes even more complex when you consider an application that comprises several services (microservices) that are in turn backed by several pods.

Observability is useful in troubleshooting and monitoring the security of workloads in Kubernetes. As an example, observability in the context of a service in Kubernetes will allow you to do the following:

  • Visualize your Kubernetes cluster as a service graph, which shows how pods are associated with services and the communication flows between services

  • Overlay application (Layer 7) and network traffic (Layer 3/Layer 4) on the service graph as separate layers that will allow you to easily determine traffic patterns and traffic load for applications and for the underlying network

  • View metadata for the node where a pod is deployed (e.g., CPU, memory, or host OS details).

  • View metrics related to the operation of a pod, traffic load, application latency (e.g., HTTP duration), network latency (network round-trip time), or pod operation (e.g., RBAC policies, service accounts, or container restarts)

  • View DNS activity (DNS response codes, latency, load) for a given service (pods backing the service)

  • Trace a user transaction that needs communication across multiple services; this is also known as distributed tracing

  • View network communication of a given service to external entities

  • View Kubernetes activity logs (e.g., audit logs) for pods and resources associated with a given service.

We will cover the details of observability and examples of how it can help security in subsequent chapters. For this discussion, we will cover a brief description of how you can use observability as a part of your security strategy.

Network traffic visibility

As mentioned, a solution that provides network flows aggregated at a service level with context like namespaces, labels, service accounts, or network policies is required to adequately monitor activity and access controls applied to the cluster. For example, there is a significant difference between reporting that IP1 communicated with IP2 over port 8080 and reporting that pods labeled “frontend” communicated with pods labeled “backend” on certain ports or traffic patterns between deployments of pods in a Kubernetes cluster. This reporting will allow you to review communication from external entities and apply IP address–based threat feeds to detect activity from known malicious IP addresses or even traffic from unexpected geographical locations. We will cover details for these concepts in Chapter 11.

DNS activity logs

Domain Name System (DNS) is a system used to translate domain names into IP addresses. In your Kubernetes cluster, it is critical to review DNS activity logs to detect unexpected activity, for example queries to known malicious domains, DNS response codes like NXDOMAIN, and unexpected increases in bytes and packets in DNS queries. We will cover details for these concepts in Chapter 11.

Application traffic visibility

We recommend you review application traffic flows for suspicious activity like unexpected response codes and rare or known malicious HTTP headers (user-agent, query parameters). HTTP is the most common protocol used in Kubernetes deployments, so it is important to work with your security research team to monitor HTTP traffic for malicious traffic. In case you use other application protocols (e.g., Kafka, MySQL), you need to do the same for those as well.

Kubernetes activity logs

In addition to network activity logs, you must also monitor Kubernetes activity logs to detect malicious activity. For example, review access-denied logs for resources access and service account creation/modification. Review namespace creation/modification logs for unexpected activity. And review the Kubernetes audit logs which record requests to the Kubernetes API.

Machine learning/anomaly detection

Machine learning is a technique where a system is able to derive patterns from data over a period of time. The output is a machine learning model, which can then be used to make predictions and detect deviations in real data based on the prediction. We recommend you consider applying machine learning–based anomaly detection to various metrics to detect strange activity. A simple and effective way is to apply a machine learning technique known as baselining to individual metrics. This way you do not need to worry about applying rules and thresholds for each metric; the system does that for you and reports deviations as anomalies. Applying machine learning techniques to network traffic is a relatively new area and is gaining traction with security teams. We will cover this topic in detail in Chapter 6.

There are many solutions that you can choose for your observability strategy for Kubernetes (Datadog, Calico Enterprise, cloud provider–based solutions from Google, AWS, Azure).

Security Frameworks

Finally, we want to make you aware of security frameworks that provide the industry a common methodology and terminology for security best practices. Security frameworks are a great way to understand attack techniques and best practices to defend and mitigate attacks. You should use them to build and validate your security strategy. Please note these frameworks may not be specific to Kubernetes, but they provide insights into techniques used by adversaries in attacks, and security researchers will need to review and see if they are relevant to Kubernetes. We will review two well-known frameworks—MITRE and Threat Matrix for Kubernetes.


MITRE is a knowledge base of adversary tactics and techniques based on real-world observations of cyberattacks. The MITRE ATT&CK® Matrix for Enterprise is useful because it provides the tactics and techniques categorized for each stage of the cybersecurity kill chain. The kill chain is a description of the stages in a cyberattack and is useful for building an effective defense against an attack. MITRE also provides an attack matrix tailored for cloud environments like AWS, Google Cloud, and Microsoft Azure.

Figure 1-3 describes the MITRE ATT&CK® Matrix for AWS. We recommend that you review each of the stages described in the attack matrix as you build your threat model for securing your Kubernetes cluster.

Figure 1-3. Attack matrix for cloud environments in AWS

Threat matrix for Kubernetes

The other framework is a threat matrix that is a Kubernetes-specific application of the generic MITRE attack matrix. It was published by the Microsoft team based on security research and real-world attacks. This is another excellent resource to use to build and validate your security strategy.

Figure 1-4 provides the stages that are relevant to your Kubernetes cluster. They map to the various stages we discussed in this chapter. For example, you should consider the compromised images in the registry in the initial access stage, the access cloud resources in the privilege escalation stage, and the cluster internal network in the lateral movement stage for build, deploy, and runtime security, respectively.

Figure 1-4. Threat matrix for Kubernetes

Security and Observability

In a dynamic environment like Kubernetes, a secure deployment of your applications can be achieved by thinking about security and observability together. As an example, you need to “observe” your cluster to find the optimal way to implement controls to secure the cluster. Kubernetes as an orchestration engine has strong adoption due to the fact that it is declarative in nature, allowing users to specify higher-level outcomes. Kubernetes also has built-in capabilities to ensure that your cluster operates as per the specifications. It does this by monitoring the various attributes and taking action (e.g., a pod restart) if the attribute deviates from the specified value for a period of time. These aspects of Kubernetes make it difficult to implement the visibility and controls needed to secure a cluster. The controls you implement need to be aligned with Kubernetes operations. Therefore, before you think of adding any controls to Kubernetes, it is important to understand the context—for example, you cannot isolate a pod by applying a policy that does not allow it to communicate with anything else. Kubernetes will detect that the pod is not able to communicate with the other elements (e.g., API server), determine that the pod is not operating as specified, and restart and spin up the pod somewhere else in the cluster.

What you have to do is first understand how the pod operates and understand what its expected operation is and then apply controls or detect unexpected events. After that, you determine if the unexpected event is an operations issue or a security issue and then apply the required remediation. In order to do this, observability and security go hand in hand: You observe to understand what is expected and apply controls to ensure expected operation, then observe to detect unexpected events and analyze them, and then add necessary controls to remediate any issue due to the event. Therefore, you need a holistic approach for security and observability when you think about securing your clusters.


By now you should have a high-level overview of what Kubernetes security and observability entails. These are the foundational concepts that underpin this entire book. In short:

  • Security for Kubernetes is very different from traditional security and requires a holistic security and observability approach at all the stages of workload deployment—build, deploy, and runtime.

  • Kubernetes is declarative and abstracts the details of workload operations, which means workloads can be running anywhere over a network of nodes. Also, workloads can be ephemeral, where they are destroyed and re-created on a different node. Securing such a declarative distributed system requires that you think about security at all stages.

  • We hope you understand the importance of collaboration between the application, platform, and security teams when designing and implementing a holistic security approach.

  • MITRE and the Threat Matrix for Kubernetes are two security frameworks that are widely adopted by security teams.

It’s important that you take in all of this together, because a successful security and observability strategy is a holistic one. In the next chapter, we will cover infrastructure security.

Get Kubernetes Security and Observability now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.