Chapter 4. Security

Security is an incredibly wide field and can definitely require multiple books on its own. In fact, there are a lot of great books on Kubernetes security already. However, operating OpenShift clusters cannot be done without security in mind. The cost of mistakes when it comes to security-related tasks is higher than in most other areas of operating a cluster. Recent data breaches and hacks have cost companies hundreds of millions of dollars, and that is even without potentially still uncovered issues.

This chapter covers the fundamental concepts of securing your cluster and your workloads, while staying more abstract than other chapters, focusing on concepts over implementation.

Cluster Access

When you have your cluster set up, you will have access to it using the kubeadmin account, but that is not very secure, and it also doesn’t really scale, because you would need to hand out the password to everyone who wants to use the clusters and have them be admin. Instead, you will want to provision users, for which there are different methods, starting with the easiest: create a user by hand using the CLI. That doesn’t scale very well either, so OpenShift comes with the ability to provision users automatically with the help of identity providers (IdP). Currently, the following identity providers can be used with OpenShift:

  • HTPasswd
  • Keystone
  • LDAP
  • Basic Authentication
  • Request Header
  • GitHub
  • GitLab
  • Google
  • OpenID Connect

The implementation can vary a bit, but generally the steps are:

  • Create Oauth app with Identity Provider

  • Add Client secret to OpenShift

  • Add Client ID to OpenShift

  • Optionally add a Certificate Authority (CA) for your IdP instance to OpenShift

After you create the Oauth app with your identity provider of choice, you will be shown the Client ID and Client Secret. Usually, you will not be able to view the secret again, so you might want to save that in a password store; otherwise, you will have to regenerate it later.

You will now have to put the information you have into OpenShift objects. The Client Secret goes into a secret with the following command:

$ oc create secret generic github-client-secret \
--from-literal=clientSecret=superSecretClientSecret -n openshift-config
secret/github-client-secret created

Now that this is accessible for OpenShift, you can create the required Custom Resource, which will look something like Example 4-1. You have to fill in your Client ID and also the organization and teams. This is to ensure only users that are part of those organizations and teams will have access.

Example 4-1. Oauth Configuration
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: githubidp
    mappingMethod: claim
    type: GitHub
    github:
      ca:
        name: ca-config-map
      clientID: 1234ClientID5678
      clientSecret:
        name: github-client-secret
      hostname: ...
      organizations:
      - myorganization
      teams:
      - myorganization1/team-a

One other interesting and important value is the mappingMethod. This determines how users are created on cluster. The options here are:

  • claim

  • lookup

  • generate

  • add

Not all of them are available for all identity providers, but in the case of GitHub as an identity provider, they are.

Logically they all do different things, and claim has become the most common for most users. When setting the mapping to claim, OpenShift will attempt to create a new user on the cluster, when someone logs in through the identity provider for the first time, and will fail if there is a user already on this cluster that authenticated through a different method.

The lookup method will check whether a given user for the identity they are trying to log in with exists. This requires an additional process, because users will not be provisioned on cluster automatically.

The slightly more generous alternative to claim is generate. It works similarly to claim but, if a given user exists on cluster already with a different identity, OpenShift will try to provision a new user. For example if manuel logged in via the also provisioned Google identity provider and now tries to log in via GitHub, the generate mappingMethods will try to create a new user: manuel2 or similar.

If you want to avoid that, you need to use the add method. Here OpenShift adds the new identity to the existing user.

Role-Based Access Control

Besides gaining access to your cluster, if you need to perform any actions on the cluster, you need a specific set of privileges to do so. Kubernetes has the concept of Role-Based Access Control (RBAC) that therefore is also used by OpenShift.

The following sections go over the nuances of RBAC in more detail.

Roles and ClusterRoles

A role is a specific set of permissions. The permissions start from nothing and then add up. There is no denying anything explicitly, mainly because a denial to certain privileges is only required if they have been granted before, which should not happen, since you start from zero privileges. It is thereby crucial to follow the principle of the least privilege: only assign any privileges that are absolutely required.

To assign a certain set of privileges, you need to understand how OpenShift resources work: every API has resources that you can execute certain actions against. Those actions are defined as verbs. The following example shows an interaction against the core API, using the GET verb for any resources of type pod:

$ oc get pods

All interactions with any OpenShift resource follow this concept, and while you can extend OpenShift with Custom Resource Definitions and thereby add new resources and APIs, the list of verbs is fixed to the following:

  • GET

  • CREATE

  • APPLY

  • UPDATE

  • PATCH

  • DELETE

  • PROXY

  • LIST

  • WATCH

  • DELETECOLLECTION

When you start to create a role, you have to ask yourself, “What do I want to do against which resource for which API?” From there you go ahead and build a role definition. The following example shows a role definition that would allow the example command oc get pods:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "list"]

You might have noticed that in addition to the pure GET verb, there is also the LIST permission. If you were to leave that, you could get only one specific pod, which you already know about, whereas the oc get pods command has to list all pods to get them.

You can also see that there is a namespace parameter. This is the difference between a Role and a ClusterRole: Roles are namespaced, so the preceding definition only permits the actions in the default namespace. The following example for a ClusterRole will give the same permissions, but cluster wide:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-reader-cluster
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "list"]

In addition to removing the namespace limitation, ClusterRoles are also the only means to give access to cluster-wide resources, such as nodes. That means you cannot use a Role to grant access to cluster-wide resources and must use a ClusterRole instead.

RoleBindings and ClusterRoleBindings

Now that you have defined a certain set of privileges that you want or need, you actually need a way to assign that to a user or a system account. That is the task of RoleBindings. A RoleBinding has a set of subjects, which are the users. Think of it as “They are subject to be assigned those permissions.” The other part is the roleRef, which is the role you want to bind to the subjects. The following example assigns our pod-reader role to the user Manuel:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User
  name: Manuel
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

After the preceding is applied, Manuel would be granted the permissions to get and list pods in the default namespace.

You can assign the same role to multiple users in the same RoleBinding. In the next example, we assign the pod-reader role to Manuel and Rick:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User
  name: Manuel
  apiGroup: rbac.authorization.k8s.io
- kind: User
  name: Rick
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Note that the name variable is case-sensitive. That means that Manuel is not the same user as manuel, and it is easy to trip over because an IDE will commonly not detect this as an issue.

The equivalent to a RoleBinding is the ClusterRoleBinding. It allows for granting cluster-wide accessing of resources. The following example will grant Rick and Manuel access to get and list all pods in all namespaces:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: read-pods-cluster
subjects:
- kind: User
  name: Manuel
  apiGroup: rbac.authorization.k8s.io
- kind: User
  name: Rick
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader-cluster
  apiGroup: rbac.authorization.k8s.io
Warning

Every permission in your OpenShift cluster is set up this way, and you should be wary about granting too many permissions even if it might become cumbersome at some point. This is especially true when you are starting to put workloads on your cluster and have to start to use ServiceAccounts.

CLI

The oc command line interface gives you the ability to perform some of the preceding actions ad hoc as well as get an overview of the roles and bindings that currently exist on your cluster.

To check which roles exist in your current namespace and what permissions they grant, execute the following command:

$ oc describe role.rbac
Name:         prometheus-k8s
Labels:       app.kubernetes.io/component=prometheus
              app.kubernetes.io/instance=k8s
              app.kubernetes.io/name=prometheus
              app.kubernetes.io/part-of=openshift-monitoring
              app.kubernetes.io/version=2.32.1
Annotations:  <none>
PolicyRule:
Resources                    Non-Resource URLs  Resource Names  Verbs
 ---------                    -----------------  --------------  -----
endpoints                    []                 []              [get list watch]
pods                         []                 []              [get list watch]
services                     []                 []              [get list watch]
ingresses.extensions         []                 []              [get list watch]
ingresses.networking.k8s.io  []                 []              [get list watch]

The same command works respectively for ClusterRoles; just be prepared for a lot more output, given the vast number of moving bits and pieces that are on an OpenShift cluster out of the box.

You can also find out which RoleBindings are in the current namespace using the following command:

$ oc describe rolebindings.rbac
Name:         prometheus-k8s
Labels:       app.kubernetes.io/component=prometheus
              app.kubernetes.io/instance=k8s
              app.kubernetes.io/name=prometheus
              app.kubernetes.io/part-of=openshift-monitoring
              app.kubernetes.io/version=2.32.1
Annotations:  <none>
Role:
  Kind:  Role
  Name:  prometheus-k8s
Subjects:
  Kind            Name            Namespace
  ----            ----            ---------
  ServiceAccount  prometheus-k8s  openshift-monitoring

Other than purely displaying existing state on a cluster, you can also use the CLI to apply changes. For example, you can add Manuel to the pod-reader:

$ oc adm policy add-role-to-user pod-reader Manuel -n default
clusterrole.rbac.authorization.k8s.io/pod-reader added: "Manuel"

ServiceAccounts

When you have certain tasks to automate, you might want to use a system account rather than an actual user and its token, respectively. ServiceAccounts obey the exact same rules for permissions as any other user on the cluster, so the important aspect here is to know how to create a ServiceAccount and then use it.

To create a service account, run the following command:

$ oc create sa my-bot
serviceaccount/my-bot created

You can also retrieve a ServiceAccount and its information:

$ oc get sa my-bot -o yaml
apiVersion: v1
imagePullSecrets:
- name: my-bot-dockercfg-bpf5n
kind: ServiceAccount
metadata:
  creationTimestamp: "2022-08-03T13:57:30Z"
  name: my-bot
  namespace: default
  resourceVersion: "25019623"
  uid: f718f7be-abce-47f0-a43b-0dc02b044bcf
secrets:
- name: my-bot-token-zwskx
- name: my-bot-dockercfg-bpf5n

You can now add any privileges to it. In the following example, we add the pod-reader role to the bot account:

$ oc policy add-role-to-user pod-reader system:serviceaccount:default:my-bot
ClusterRole.rbac.authorization.k8s.io/pod-reader added:

"system:serviceaccount:default:my-bot"

Now that the my-bot account can do something, you can go ahead and start to use it from outside the cluster. In this simple example we could use it in the following command to list all pods in the default namespace, without being authenticated ourselves:

$ export TOKEN=$(oc sa get-token my-bot)

$ oc get pods -n default --token $TOKEN
NAMESPACE NAME
grafana   grafana-deployment-787fd449f4-z5vkf
grafana   grafana-operator-controller-manager-5b5d45b9bd-wrnsv

$ oc whoami --token $TOKEN
system:serviceaccount:default:my-bot

The last command is not needed but makes it clear that the command was executed as the ServiceAccount.

The use cases for ServiceAccounts are wide, but generally if you need to perform any action from outside the cluster in an automated way, you probably should be using ServiceAccounts, and since there is no limit to the number of ServiceAccounts that can exist on a given cluster, it is a great way to follow the idea of the least privilege again.

For deployments, each app can have its own ServiceAccount with privileges scoped just to what is needed to deploy, so there is no scenario in which a single compromised token or account can interfere with all of your platform at once. This separation adds extra layers of rudimentary protection that you should have enabled in all of your infrastructure. You would not grant a user pseudoprivileges on a Linux host just to create a deployment, would you?

Now that you got access to a cluster and subsequently assigned some privileges, it is time to remove the default kubeadmin user. You should not log in as kubeadmin anymore, and while the password is fairly complex and therefore not easy to guess or brute force, an unused admin account is definitely not what you want. Before you do that, double check on the following:

  • You have configured an Identity Provider

  • You have assigned the cluster-admin role to a user

Both of these are crucial as you will otherwise lock yourself out of admin access.

If you have done both of these, you can now log in as a user with the cluster-admin role and execute the following command:

$ oc delete secrets kubeadmin -n kube-system

Threat Modelling

Threat modelling means that you look at potential threats to your system, mapping out their likeliness and impact. While your security team is busy doing that, it is good to think about and evaluate the risks to your platform and how to best protect against those risks.

To get started, think about the following scenario: someone writes all their passwords on a little notepad in their home. Is this good or bad?

You might be quick to say that they should be using a safe password store or something similar, but is it a realistic scenario that someone breaks into their home and steals the notepad to get their passwords? For the average citizen, no. It is not a real issue. Yes, there are better ways, but generally a notepad is OK and definitely better than using the same password everywhere.

But if the person using a notepad for all their passwords is the CEO of a Fortune 500 company, that’s different, and there is, in fact, some chance that someone will try to steal their password notepad.

The same applies to your OpenShift cluster, and you should keep that in mind when you are hardening your cluster. If you are running CodeReadContainers on your laptop to learn and understand OpenShift, it is probably fine to just keep using the kubeadmin user. If you run a production cluster for your company that is exposed to the internet, you probably want to lock it down as described in the preceding sections with access control and RBAC rules.

Threat modelling a little in your head before you do things makes life easier for you and for your security team, as you will be able to explain what you did to protect your infrastructure and why you might not have done certain other actions.

Workloads

Locking down your cluster is one thing, but you want to use your cluster at some point, and that is when you start to deploy workloads, which need some security considerations as well. You want to be in a state where you make it as hard as possible to compromise a given workload and then even harder for a potential attacker to make lateral movements from there. This means even a single compromised workload should not lead to a compromise of your whole cluster.

One common type of workload to deploy is web services and anything else that is exposed via a route, and unless it is customer facing and open for the world, you most definitely should lock this down. The easiest way to do this out of the box is to use OpenShift’s native capabilities. OpenShift allows you to deploy an Oauth-proxy in front of your application and thereby leverage the RBAC permission model you already have to define access rights for your exposed service.

In the following example, you will deploy a simple game, but since it’s still a prototype, you only want the users who work on it to be able to access it.

First, you need a new namespace:

$ oc create namespace s3e
namespace/s3e created

$ oc project s3e
Now using project "s3e" on server "https://api.crc.testing:6443".

Now that you have a place for your game to live, it’s time to get started with the proxy. You need a ServiceAccount for it to use:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: proxy
  annotations:
    serviceaccounts.openshift.io/oauth-redirectreference.primary: >
      '{"kind":"OAuthRedirectReference",
      "apiVersion":"v1",
      "reference":{"kind":"Route","name":"proxy"}}'

Next, get your app deployment ready that includes the proxy:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: proxy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: proxy
  template:
    metadata:
      labels:
        app: proxy
    spec:
      serviceAccountName: proxy
      containers:
      - name: oauth-proxy
        image: openshift/oauth-proxy:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8443
          name: public
        args:
        - --https-address=:8443
        - --provider=openshift
        - --openshift-service-account=proxy
        - --upstream=http://localhost:8080
        - --tls-cert=/etc/tls/private/tls.crt
        - --tls-key=/etc/tls/private/tls.key
        - --cookie-secret=SECRET
        volumeMounts:
        - mountPath: /etc/tls/private
          name: proxy-tls
      - name: app
        image: quay.io/operatingopenshift/s3e-game:latest
      volumes:
      - name: proxy-tls
        secret:
          secretName: proxy-tls

If you use a different image, make sure that the upstream variable still works correctly. In this case the game exposes itself on port 8080, so this is perfectly fine.

You can configure the proxy further using the --openshift-sar= argument to lock down specifically based on RBAC permissions. If you do not set this argument, as in this case, every user who can log in to OpenShift can also access whatever is behind the proxy, which seems sane for this scenario.

From here on you will also need a service and a route:

apiVersion: v1
kind: Route
metadata:
  name: proxy
spec:
  to:
    kind: Service
    name: proxy
  tls:
    termination: Reencrypt
apiVersion: v1
kind: Service
metadata:
  name: proxy
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: proxy-tls
spec:
  ports:
  - name: proxy
    port: 443
    targetPort: 8443
  selector:
    app: proxy

The selector here is specifying that the proxy is the backend to send traffic to.

You can put all of the above in a single YAML file and then deploy it to the cluster like so:

kind: List
apiVersion: v1
items:
# Create a proxy service account and ensure it will use the route "proxy"
- apiVersion: v1
  kind: ServiceAccount
  metadata:
    name: proxy
    annotations:
      serviceaccounts.openshift.io/oauth-redirectreference.primary: >
        '{"kind":"OAuthRedirectReference",
        "apiVersion":"v1",
        "reference":{"kind":"Route","name":"proxy"}}'
# Create a secure connection to the proxy via a route
- apiVersion: v1
  kind: Route
  metadata:
    name: proxy
  spec:
    to:
      kind: Service
      name: proxy
    tls:
      termination: Reencrypt
- apiVersion: v1
  kind: Service
  metadata:
    name: proxy
    annotations:
      service.alpha.openshift.io/serving-cert-secret-name: proxy-tls
  spec:
    ports:
    - name: proxy
      port: 443
      targetPort: 8443
    selector:
      app: proxy
# Launch a proxy as a sidecar
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: proxy
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: proxy
    template:
      metadata:
        labels:
          app: proxy
      spec:
        serviceAccountName: proxy
        containers:
        - name: oauth-proxy
          image: openshift/oauth-proxy:latest
          imagePullPolicy: IfNotPresent
          ports:
          - containerPort: 8443
            name: public
          args:
          - --https-address=:8443
          - --provider=openshift
          - --openshift-service-account=proxy
          - --upstream=http://localhost:8080
          - --tls-cert=/etc/tls/private/tls.crt
          - --tls-key=/etc/tls/private/tls.key
          - --cookie-secret=SECRET
          volumeMounts:
          - mountPath: /etc/tls/private
            name: proxy-tls

        - name: app
          image: quay.io/operatingopenshift/s3e-game:latest
        volumes:
        - name: proxy-tls
          secret:
            secretName: proxy-tls

To apply the configuration, execute the following command:

$ oc apply -f deploy-snake-proxy.yaml
serviceaccount/proxy created
route.route.openshift.io/proxy created
service/proxy created
deployment.apps/proxy created

You can now get the route URL via oc get route like so:

$ oc get route -o json | jq .items[].spec.host
"proxy-s3e.apps-crc.testing"

If you bring that up in your browser, you will now be presented with a login screen. This also lets you know that you secured your app via Oauth-proxy.

Summary

In this chapter, we discussed security best practices and considerations to make when planning and operating your OpenShift cluster. These will give you a starting point. However, as clusters as well as adversaries are ever evolving, it is of major importance to revise the steps you took.

Get Operating OpenShift now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.