book

Kubernetes Best Practices, 2nd Edition

by Brendan Burns, Eddie Villalba, Dave Strebel, Lachlan Evenson

October 2023

Intermediate to advanced

324 pages

7h 46m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Includes Quizzes

Who Should Read This BookWhy We Wrote This BookNavigating This BookNew to This EditionConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
Application OverviewManaging Configuration FilesCreating a Replicated Service Using DeploymentsBest Practices for Image ManagementCreating a Replicated ApplicationSetting Up an External Ingress for HTTP TrafficConfiguring an Application with ConfigMapsManaging Authentication with SecretsDeploying a Simple Stateful DatabaseCreating a TCP Load Balancer by Using ServicesUsing Ingress to Route Traffic to a Static File ServerParameterizing Your Application by Using HelmDeploying Services Best PracticesSummary
GoalsBuilding a Development ClusterSetting Up a Shared Cluster for Multiple DevelopersOnboarding UsersCreating and Securing a NamespaceManaging NamespacesCluster-Level ServicesEnabling Developer WorkflowsInitial SetupEnabling Active DevelopmentEnabling Testing and DebuggingSetting Up a Development Environment Best PracticesSummary
Metrics Versus LogsMonitoring TechniquesMonitoring PatternsKubernetes Metrics OverviewcAdvisorMetrics Serverkube-state-metricsWhat Metrics Do I Monitor?Monitoring ToolsMonitoring Kubernetes Using PrometheusLogging OverviewTools for LoggingLogging by Using a Loki-StackAlertingBest Practices for Monitoring, Logging, and AlertingMonitoringLoggingAlertingSummary
Configuration Through ConfigMaps and SecretsConfigMapsSecretsCommon Best Practices for the ConfigMap and Secrets APIsBest Practices Specific to SecretsRBACRBAC PrimerRBAC Best PracticesSummary
Version ControlContinuous IntegrationTestingContainer BuildsContainer Image TaggingContinuous DeploymentDeployment StrategiesTesting in ProductionSetting Up a Pipeline and Performing a Chaos ExperimentSetting Up CISetting Up CDPerforming a Rolling UpgradeA Simple Chaos ExperimentBest Practices for CI/CDSummary
VersioningReleasesRolloutsPutting It All TogetherBest Practices for Versioning, Releases, and RolloutsSummary
Distributing Your ImageParameterizing Your DeploymentLoad-Balancing Traffic Around the WorldReliably Rolling Out Software Around the WorldPre-Rollout ValidationCanary RegionIdentifying Region TypesConstructing a Global RolloutWhen Something Goes WrongWorldwide Rollout Best PracticesSummary
Kubernetes SchedulerPredicatesPrioritiesAdvanced Scheduling TechniquesPod Affinity and Anti-AffinitynodeSelectorTaints and TolerationsPod Resource ManagementResource RequestResource Limits and Pod Quality of ServicePodDisruptionBudgetsManaging Resources by Using NamespacesResourceQuotaLimitRangeCluster ScalingApplication ScalingScaling with HPAHPA with Custom MetricsVertical Pod AutoscalerResource Management Best PracticesSummary
Kubernetes Network PrinciplesNetwork Plug-insKubenetKubenet Best PracticesThe CNI Plug-inCNI Best PracticesServices in KubernetesService Type ClusterIPService Type NodePortService Type ExternalNameService Type LoadBalancerIngress and Ingress ControllersGateway APIServices and Ingress Controllers Best PracticesNetwork Security PolicyNetwork Policy Best PracticesService MeshesService Mesh Best PracticesSummary

Pod Security Admission ControllerEnabling Pod Security AdmissionPod Security levelsActivating Pod Security Using Namespace LabelsWorkload Isolation and RuntimeClassUsing RuntimeClassRuntime ImplementationsWorkload Isolation and RuntimeClass Best PracticesOther Pod and Container Security ConsiderationsAdmission ControllersIntrusion and Anomaly Detection ToolingSummary
Why Policy and Governance Are ImportantHow Is This Policy Different?Cloud Native Policy EngineIntroducing GatekeeperExample PoliciesGatekeeper TerminologyDefining Constraint TemplatesDefining ConstraintsData ReplicationUXUsing Enforcement Action and AuditMutationTesting PoliciesBecoming Familiar with GatekeeperPolicy and Governance Best PracticesSummary
Why Multiple Clusters?Multicluster Design ConcernsManaging Multiple Cluster DeploymentsDeployment and Management PatternsThe GitOps Approach to Managing ClustersMulticluster Management ToolsKubernetes FederationManaging Multiple Clusters Best PracticesSummary
Importing Services into KubernetesSelector-Less Services for Stable IP AddressesCNAME-Based Services for Stable DNS NamesActive Controller-Based ApproachesExporting Services from KubernetesExporting Services by Using Internal Load BalancersExporting Services on NodePortsIntegrating External Machines and KubernetesSharing Services Between KubernetesThird-Party ToolsConnecting Cluster and External Services Best PracticesSummary
Why Is Kubernetes Great for Machine Learning?Machine Learning WorkflowMachine Learning for Kubernetes Cluster AdminsModel Training on KubernetesDistributed Training on KubernetesResource ConstraintsSpecialized HardwareLibraries, Drivers, and Kernel ModulesStorageNetworkingSpecialized ProtocolsData Scientist ConcernsMachine Learning on Kubernetes Best PracticesSummary
Approaches to Developing Higher-Level AbstractionsExtending KubernetesExtending Kubernetes ClustersExtending the Kubernetes User ExperienceMaking Containerized Development EasierDeveloping a “Push-to-Deploy” ExperienceDesign Considerations When Building PlatformsSupport Exporting to a Container ImageSupport Existing Mechanisms for Service and Service DiscoveryBuilding Application Platforms Best PracticesSummary
Volumes and Volume MountsVolume Best PracticesKubernetes StoragePersistentVolumePersistentVolumeClaimsStorageClassesKubernetes Storage Best PracticesStateful ApplicationsStatefulSetsOperatorsStatefulSet and Operator Best PracticesSummary
Admission ControlWhat Are They?Why Are They Important?Admission Controller TypesConfiguring Admission WebhooksAdmission Control Best PracticesAuthorizationAuthorization ModulesAuthorization Best PracticesSummary
What Is GitOps?Why GitOps?GitOps Repo StructureManaging SecretsSetting Up FluxGitOps ToolingGitOps Best PracticesSummary
Cluster Securityetcd AccessAuthenticationAuthorizationTLSKubelet and Cloud Metadata AccessSecretsLogging and AuditingCluster Security Posture ToolingCluster Security Best PracticesWorkload Container SecurityPod Security AdmissionSeccomp, AppArmor, and SELinuxAdmission ControllersOperatorsNetwork PolicyRuntime SecurityWorkload Container Security Best PracticesCode SecurityNon-Root and Distroless ContainersContainer Vulnerability ScanningCode Repository SecurityCode Security Best PracticesSummary
Chaos TestingGoals for Chaos TestingPrerequisites for Chaos TestingChaos Testing Your Application’s CommunicationChaos Testing Your Application’s OperationFuzz Testing Your Application for Security and ResiliencySummaryLoad TestingGoals for Load TestingPrerequisites for Load TestingGenerating Realistic TrafficLoad Testing Your ApplicationTuning Your Application Using Load TestsSummaryExperimentsGoals for ExperimentsPrerequisites for an ExperimentSetting Up an ExperimentSummaryChaos Testing, Load Testing, and Experiments Summary
Operator Key ComponentsCustom Resource DefinitionsCreating Our APIController ReconciliationResource ValidationController ImplementationOperator Life CycleVersion UpgradesOperator Best PracticesSummary

Content preview from Kubernetes Best Practices, 2nd Edition

Chapter 3. Monitoring and Logging in Kubernetes

In this chapter, we discuss best practices for monitoring and logging in Kubernetes. We’ll dive into the details of different monitoring patterns, important metrics to collect, and building dashboards from these raw metrics. We then wrap up with examples of implementing monitoring for your Kubernetes cluster.

Metrics Versus Logs

You first need to understand the difference between log collection and metrics collection. They are complementary but serve different purposes:

Metrics: A series of numbers measured over a period of time.
Logs: Logs keep track of what happens while a program is running, including any errors, warnings, or notable events that occur.

A example of where you would need to use both metrics and logging is when an application is performing poorly. Our first indication of the issue might be an alert of high latency on the pods hosting the application, but the metrics might not give a good indication of the issue. We then can look into our logs to investigate errors that are being emitted from the application.

Monitoring Techniques

Closed-box monitoring focuses on monitoring from the outside of an application and is what’s been used traditionally when monitoring systems for components like CPU, memory, storage, and so on. Closed-box monitoring can still be useful for monitoring at the infrastructure level, but it lacks insights and context into how the application is operating. For example, to test whether a ...