book

Kubernetes Best Practices, 2nd Edition

by Brendan Burns, Eddie Villalba, Dave Strebel, Lachlan Evenson

October 2023

Intermediate to advanced

324 pages

7h 46m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Preface
Who Should Read This BookWhy We Wrote This BookNavigating This BookNew to This EditionConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Setting Up a Basic Service
Application OverviewManaging Configuration FilesCreating a Replicated Service Using DeploymentsBest Practices for Image ManagementCreating a Replicated ApplicationSetting Up an External Ingress for HTTP TrafficConfiguring an Application with ConfigMapsManaging Authentication with SecretsDeploying a Simple Stateful DatabaseCreating a TCP Load Balancer by Using ServicesUsing Ingress to Route Traffic to a Static File ServerParameterizing Your Application by Using HelmDeploying Services Best PracticesSummary
2. Developer Workflows
GoalsBuilding a Development ClusterSetting Up a Shared Cluster for Multiple DevelopersOnboarding UsersCreating and Securing a NamespaceManaging NamespacesCluster-Level ServicesEnabling Developer WorkflowsInitial SetupEnabling Active DevelopmentEnabling Testing and DebuggingSetting Up a Development Environment Best PracticesSummary
3. Monitoring and Logging in Kubernetes
Metrics Versus LogsMonitoring TechniquesMonitoring PatternsKubernetes Metrics OverviewcAdvisorMetrics Serverkube-state-metricsWhat Metrics Do I Monitor?Monitoring ToolsMonitoring Kubernetes Using PrometheusLogging OverviewTools for LoggingLogging by Using a Loki-StackAlertingBest Practices for Monitoring, Logging, and AlertingMonitoringLoggingAlertingSummary
4. Configuration, Secrets, and RBAC
Configuration Through ConfigMaps and SecretsConfigMapsSecretsCommon Best Practices for the ConfigMap and Secrets APIsBest Practices Specific to SecretsRBACRBAC PrimerRBAC Best PracticesSummary
5. Continuous Integration, Testing, and Deployment
Version ControlContinuous IntegrationTestingContainer BuildsContainer Image TaggingContinuous DeploymentDeployment StrategiesTesting in ProductionSetting Up a Pipeline and Performing a Chaos ExperimentSetting Up CISetting Up CDPerforming a Rolling UpgradeA Simple Chaos ExperimentBest Practices for CI/CDSummary
6. Versioning, Releases, and Rollouts
VersioningReleasesRolloutsPutting It All TogetherBest Practices for Versioning, Releases, and RolloutsSummary
7. Worldwide Application Distribution and Staging
Distributing Your ImageParameterizing Your DeploymentLoad-Balancing Traffic Around the WorldReliably Rolling Out Software Around the WorldPre-Rollout ValidationCanary RegionIdentifying Region TypesConstructing a Global RolloutWhen Something Goes WrongWorldwide Rollout Best PracticesSummary
8. Resource Management
Kubernetes SchedulerPredicatesPrioritiesAdvanced Scheduling TechniquesPod Affinity and Anti-AffinitynodeSelectorTaints and TolerationsPod Resource ManagementResource RequestResource Limits and Pod Quality of ServicePodDisruptionBudgetsManaging Resources by Using NamespacesResourceQuotaLimitRangeCluster ScalingApplication ScalingScaling with HPAHPA with Custom MetricsVertical Pod AutoscalerResource Management Best PracticesSummary
9. Networking, Network Security, and Service Mesh
Kubernetes Network PrinciplesNetwork Plug-insKubenetKubenet Best PracticesThe CNI Plug-inCNI Best PracticesServices in KubernetesService Type ClusterIPService Type NodePortService Type ExternalNameService Type LoadBalancerIngress and Ingress ControllersGateway APIServices and Ingress Controllers Best PracticesNetwork Security PolicyNetwork Policy Best PracticesService MeshesService Mesh Best PracticesSummary

10. Pod and Container Security
Pod Security Admission ControllerEnabling Pod Security AdmissionPod Security levelsActivating Pod Security Using Namespace LabelsWorkload Isolation and RuntimeClassUsing RuntimeClassRuntime ImplementationsWorkload Isolation and RuntimeClass Best PracticesOther Pod and Container Security ConsiderationsAdmission ControllersIntrusion and Anomaly Detection ToolingSummary
11. Policy and Governance for Your Cluster
Why Policy and Governance Are ImportantHow Is This Policy Different?Cloud Native Policy EngineIntroducing GatekeeperExample PoliciesGatekeeper TerminologyDefining Constraint TemplatesDefining ConstraintsData ReplicationUXUsing Enforcement Action and AuditMutationTesting PoliciesBecoming Familiar with GatekeeperPolicy and Governance Best PracticesSummary
12. Managing Multiple Clusters
Why Multiple Clusters?Multicluster Design ConcernsManaging Multiple Cluster DeploymentsDeployment and Management PatternsThe GitOps Approach to Managing ClustersMulticluster Management ToolsKubernetes FederationManaging Multiple Clusters Best PracticesSummary
13. Integrating External Services with Kubernetes
Importing Services into KubernetesSelector-Less Services for Stable IP AddressesCNAME-Based Services for Stable DNS NamesActive Controller-Based ApproachesExporting Services from KubernetesExporting Services by Using Internal Load BalancersExporting Services on NodePortsIntegrating External Machines and KubernetesSharing Services Between KubernetesThird-Party ToolsConnecting Cluster and External Services Best PracticesSummary
14. Running Machine Learning in Kubernetes
Why Is Kubernetes Great for Machine Learning?Machine Learning WorkflowMachine Learning for Kubernetes Cluster AdminsModel Training on KubernetesDistributed Training on KubernetesResource ConstraintsSpecialized HardwareLibraries, Drivers, and Kernel ModulesStorageNetworkingSpecialized ProtocolsData Scientist ConcernsMachine Learning on Kubernetes Best PracticesSummary
15. Building Higher-Level Application Patterns on Top of Kubernetes
Approaches to Developing Higher-Level AbstractionsExtending KubernetesExtending Kubernetes ClustersExtending the Kubernetes User ExperienceMaking Containerized Development EasierDeveloping a “Push-to-Deploy” ExperienceDesign Considerations When Building PlatformsSupport Exporting to a Container ImageSupport Existing Mechanisms for Service and Service DiscoveryBuilding Application Platforms Best PracticesSummary
16. Managing State and Stateful Applications
Volumes and Volume MountsVolume Best PracticesKubernetes StoragePersistentVolumePersistentVolumeClaimsStorageClassesKubernetes Storage Best PracticesStateful ApplicationsStatefulSetsOperatorsStatefulSet and Operator Best PracticesSummary
17. Admission Control and Authorization
Admission ControlWhat Are They?Why Are They Important?Admission Controller TypesConfiguring Admission WebhooksAdmission Control Best PracticesAuthorizationAuthorization ModulesAuthorization Best PracticesSummary
18. GitOps and Deployment
What Is GitOps?Why GitOps?GitOps Repo StructureManaging SecretsSetting Up FluxGitOps ToolingGitOps Best PracticesSummary
19. Security
Cluster Securityetcd AccessAuthenticationAuthorizationTLSKubelet and Cloud Metadata AccessSecretsLogging and AuditingCluster Security Posture ToolingCluster Security Best PracticesWorkload Container SecurityPod Security AdmissionSeccomp, AppArmor, and SELinuxAdmission ControllersOperatorsNetwork PolicyRuntime SecurityWorkload Container Security Best PracticesCode SecurityNon-Root and Distroless ContainersContainer Vulnerability ScanningCode Repository SecurityCode Security Best PracticesSummary
20. Chaos Testing, Load Testing, and Experiments
Chaos TestingGoals for Chaos TestingPrerequisites for Chaos TestingChaos Testing Your Application’s CommunicationChaos Testing Your Application’s OperationFuzz Testing Your Application for Security and ResiliencySummaryLoad TestingGoals for Load TestingPrerequisites for Load TestingGenerating Realistic TrafficLoad Testing Your ApplicationTuning Your Application Using Load TestsSummaryExperimentsGoals for ExperimentsPrerequisites for an ExperimentSetting Up an ExperimentSummaryChaos Testing, Load Testing, and Experiments Summary
21. Implementing an Operator
Operator Key ComponentsCustom Resource DefinitionsCreating Our APIController ReconciliationResource ValidationController ImplementationOperator Life CycleVersion UpgradesOperator Best PracticesSummary
22. Conclusion
Index
About the Authors

Content preview from Kubernetes Best Practices, 2nd Edition

Chapter 14. Running Machine Learning in Kubernetes

The age of microservices, distributed systems, and the cloud has provided the perfect environmental conditions for the democratization of machine learning models and tooling. Infrastructure at scale has now become commoditized, and the tooling around the machine learning ecosystem is maturing. Kubernetes is one of the platforms that has become increasingly popular among developers, data scientists, and the wider open source community as the perfect environment to enable the machine learning workflow and life cycle. Large machine learning models like GPT-4 and DALL·E have brought machine learning into the spotlight and organizations like OpenAI have been very public about their use of Kubernetes to support these models. In this chapter, we will cover why Kubernetes is a great platform for machine learning and provide best practices for both cluster administrators and data scientists alike on how to get the most out of Kubernetes when running machine learning workloads. Specifically, we focus on deep learning rather than traditional machine learning because deep learning has quickly become the area of innovation on platforms like Kubernetes.

Why Is Kubernetes Great for Machine Learning?

Kubernetes has quickly become the home for rapid innovation in deep learning. The confluence of tooling and libraries such as TensorFlow makes this technology more accessible to a large audience of data scientists. What makes Kubernetes such a great ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098142155Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Kubernetes Best Practices, 2nd Edition

by Brendan Burns, Eddie Villalba, Dave Strebel, Lachlan Evenson

Chapter 14. Running Machine Learning in Kubernetes

Why Is Kubernetes Great for Machine Learning?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.