Chapter 6. Operating Clusters

If Tetris has taught me anything, it’s that errors pile up and accomplishments disappear.

Andrew Clay Shafer

Once you have a Kubernetes cluster, how do you know it’s in good shape and running properly? How do you scale to cope with demand, but keep cloud costs to a minimum? In this chapter we’ll look at the issues involved in operating Kubernetes clusters for production workloads, and some of the tools that can help you.

As we’ve seen in Chapter 3, there are many important things to consider about your Kubernetes cluster: availability, authentication, upgrades, and so on. If you’re using a good managed Kubernetes service, as we recommend, most of these issues should be taken care of for you.

However, what you actually do with the cluster is up to you. In this chapter you’ll learn how to size and scale the cluster, check it for conformance, find security problems, and test the resilience of your infrastructure with chaos monkeys.

Cluster Sizing and Scaling

How big does your cluster need to be? With self-hosted Kubernetes clusters, and almost all managed services, the ongoing cost of your cluster depends directly on the number and size of its nodes. If the capacity of the cluster is too small, your workloads won’t run properly, or will fail under heavy traffic. If the capacity is too large, you’re wasting money.

Sizing and scaling your cluster appropriately is very important, so let’s look at some of the decisions involved.

Capacity Planning

One ...

Get Cloud Native DevOps with Kubernetes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.