book

Kubeflow Operations Guide

by Josh Patterson, Michael Katzenellenbogen, Austin Harris

December 2020

Intermediate to advanced

301 pages

7h 18m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
What Is in This Book?Who Is This Book For?Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgmentsJoshMichaelAustin
1. Introduction to Kubeflow
Machine Learning on KubernetesThe Evolution of Machine Learning in EnterpriseIt’s Harder Than Ever to Run Enterprise InfrastructureIdentifying Next-Generation Infrastructure (NGI) Core PrinciplesKubernetes for Production Application DeploymentEnter: KubeflowWhat Problems Does Kubeflow Solve?Origin of KubeflowWho Uses Kubeflow?Common Kubeflow Use CasesRunning Notebooks on GPUsShared Multitenant Machine Learning EnvironmentBuilding a Transfer Learning PipelineDeploying Models to Production for Application IntegrationComponents of KubeflowMachine Learning ToolsApplications and ScaffoldingMachine Learning Model Inference Serving with KFServingPlatforms and CloudsSummary
2. Kubeflow Architecture and Best Practices
Kubeflow Architecture OverviewKubeflow and KubernetesWays to Run a Job on KubeflowMachine Learning Metadata ServiceArtifact StorageIstio Operations in KubeflowKubeflow Multitenancy ArchitectureMultitenancy and IsolationMultiuser ArchitectureMultiuser Authorization FlowKubeflow ProfilesMultiuser IsolationNotebook ArchitectureNotebook Server Launcher UINotebook ControllerPipelines ArchitectureKubeflow Best PracticesManaging Job DependenciesUsing GPUsExperiment ManagementSummary
3. Planning a Kubeflow Installation
Security PlanningComponents That Extend the Kubernetes APIComponents Running Atop KubernetesBackground and MotivationKubeflow and Deployed ApplicationsIntegrationUsersProfiling UsersVarying SkillsetsWorkloadsCluster UtilizationData PatternsGPU PlanningPlanning for GPUsModels that Benefit from GPUsInfrastructure PlanningKubernetes ConsiderationsOn-PremiseCloudPlacementContainer ManagementServerless Container Operations with KnativeSizing and GrowingForecastingStorageScalingSummary
4. Installing Kubeflow On-Premise
Kubernetes Operations from the Command LineInstalling kubectlUsing kubectlUsing DockerBasic Install ProcessInstalling On-PremiseConsiderations for Building Kubernetes ClustersGateway Host Access to Kubernetes ClusterActive Directory Integration and User ManagementKerberos IntegrationStorage IntegrationContainer Management and Artifact RepositoriesAccessing and Interacting with KubeflowCommon Command-Line OperationsAccessible Web UIsInstalling KubeflowSystem RequirementsSet Up and DeploySummary
5. Running Kubeflow on Google Cloud
Overview of the Google Cloud PlatformStorageGoogle Cloud Identity-Aware ProxyGoogle Cloud Security and the Cloud Identity-Aware ProxyGCP Projects for Application DeploymentsGCP Service AccountsSigning Up for Google Cloud PlatformInstalling the Google Cloud SDKUpdate PythonDownload and Install Google Cloud SDKInstalling Kubeflow on Google Cloud PlatformCreate a Project in the GCP ConsoleEnabling APIs for a ProjectSet Up OAuth for GCP Cloud IAPDeploy Kubeflow Using the Command-Line InterfaceAccessing the Kubeflow UI Post-InstallationSummary
6. Running Kubeflow on Amazon Web Services
Overview of Amazon Web ServicesStorageAmazon Storage PricingAmazon Cloud SecurityAWS Compute ServicesManaged Kubernetes on EKSSigning Up for Amazon Web ServicesInstalling the AWS CLIUpdate PythonInstall the AWS CLIKubeflow on Amazon Web ServicesInstalling kubectlInstall the eksctl CLI for Amazon EKSInstall AWS IAM AuthenticatorInstall jqUsing Managed Kubernetes on Amazon EKSCreate an EKS Service RoleCreate an AWS VPCCreating EKS ClustersDeploying an EKS Cluster with eksctlUnderstanding the Deployment ProcessKubeflow Configuration and DeploymentCustomize the Kubeflow DeploymentCustomize AuthenticationResizing EKS ClustersDeleting EKS ClustersAdding LoggingTroubleshooting DeploymentsSummary
7. Running Kubeflow on Azure
Overview of the Azure Cloud PlatformKey Azure ComponentsStorage on AzureThe Azure Security ModelService AccountsResources and Resource GroupsAzure Virtual MachinesContainers and Managed Azure Kubernetes ServicesThe Azure CLIInstalling the Azure CLIInstalling Kubeflow on Azure KubernetesAzure Login and ConfigurationCreate an AKS Cluster for KubeflowKubeflow InstallationAuthorizing Network Access to DeploymentSummary
8. Model Serving and Integration
Basic Concepts of Model ManagementUnderstanding Training Models Versus Model InferenceBuilding an Intuition for Model IntegrationScaling Model Inference ThroughputModel ManagementIntroduction to KFServingAdvantages of Using KFServingCore Concepts in KFServingSupported Pre-Built Model ServersKFServing Security ModelManaging Models with KFServingInstalling KFServing on a Kubernetes ClusterDeploying a Model on KFServingManaging Model Traffic with CanaryingDeploying a Custom TransformerRoll Back a Deployed ModelRemoving a Deployed ModelSummary
A. Infrastructure Concepts
Public Key InfrastructureAuthenticationKubeflow and AuthenticationAuthorizationAuthorization and Role-Based Access ControlLightweight Directory Access ProtocolKerberosTransport Layer SecurityX.509 CertWebhookActive DirectoryIdentity ProvidersIdentity-Aware Proxy (IAP)IAP and Google Cloud PlatformOAuthOpenID ConnectEnd-User Authentication with JWTSimple and Protected GSS_API Negotiation MechanismDex: A Federated OpenID Connect ProviderDex and KerberosService AccountsThe Control PlaneOptions for Securing the Control Plane

B. An Overview of Kubernetes
Core Kubernetes ConceptsPodObject Spec and StatusDescribing a Kubernetes ObjectSubmitting Containers to KubernetesKubernetes Resource ModelCustom Resources, Controllers, and OperatorsCustom ControllersCustom Resource Definition
C. Istio Operations and Kubeflow
Service Mesh Management with IstioIstio ArchitectureTraffic ManagementIstio Security ArchitectureIstio Authorization and Role-Based Access Control
Index

Content preview from Kubeflow Operations Guide

Chapter 1. Introduction to Kubeflow

Kubeflow is an open source Kubernetes-native platform for developing, orchestrating, deploying, and running scalable and portable machine learning (ML) workloads. It is a cloud native platform based on Google’s internal ML pipelines. The project is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

In this book we take a look at the evolution of machine learning in enterprise, how infrastructure has changed, and then how Kubeflow meets the needs of the modern enterprise.

Operating Kubeflow in an increasingly multicloud and hybrid-cloud world will be a key topic as the market grows and as Kubernetes adoption grows. A single workflow may have a life cycle that starts on-premise but quickly requires resources that are only available in the cloud. Building out machine learning tooling on the emergent platform Kubernetes is where life began for Kubeflow, so let’s start there.

Machine Learning on Kubernetes

Kubeflow began life as a basic way to get rudimentary machine learning infrastructure running on Kubernetes. The two driving forces in its development and adoption are the evolution of machine learning in enterprise and the emergence of Kubernetes as the de facto infrastructure management layer.

Let’s take a quick tour of the recent history of machine learning in enterprise to better understand how we got here.

The Evolution of Machine Learning in Enterprise

The past decade has seen the popularity and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492053262Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design