Chapter 7. Troubleshooting

Establishing a Kubernetes cluster is one thing. Making sure that the cluster stays operational is another. As a Kubernetes administrator, you are continuously confronted with making sure that the cluster stays functional. Therefore, your troubleshooting skills must be sharp so that you can come up with strategies for identifying the root cause of an issue and fixing it.

Of all the domains covered by the exam, the section “Troubleshooting” has the highest weight for the overall score, so it’s important to understand failure scenarios and learn how to fix them. This chapter will address how to monitor and troubleshoot applications in different constellations. Furthermore, we’ll discuss failures that may arise for cluster components due to misconfiguration or error conditions.

At a high level, this chapter covers the following concepts:

  • Evaluating logging options

  • Monitoring applications

  • Accessing container logs

  • Troubleshooting application failures

  • Troubleshooting cluster failures

Evaluating Cluster and Node Logging

A real-world Kubernetes cluster manages hundreds or even thousands of Pods. For every Pod, you have at least a single container running a process. Each process can produce log output to the standard output or standard error streams. It’s imperative to capture the log output to proficiently determine the root cause of an application error. Moreover, cluster components produce logs for diagnostic purposes.

As you can see, Kubernetes’ ...

Get Certified Kubernetes Administrator (CKA) Study Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.