Kubeflow Operations Guide

Book description

Building models is a small part of the story when it comes to deploying machine learning applications. The entire process involves developing, orchestrating, deploying, and running scalable and portable machine learning workloads--a process Kubeflow makes much easier. This practical book shows data scientists, data engineers, and platform architects how to plan and execute a Kubeflow project to make their Kubernetes workflows portable and scalable.

Authors Josh Patterson, Michael Katzenellenbogen, and Austin Harris demonstrate how this open source platform orchestrates workflows by managing machine learning pipelines. You'll learn how to plan and execute a Kubeflow platform that can support workflows from on-premises to cloud providers including Google, Amazon, and Microsoft.

  • Dive into Kubeflow architecture and learn best practices for using the platform
  • Understand the process of planning your Kubeflow deployment
  • Install Kubeflow on an existing on-premise Kubernetes cluster
  • Deploy Kubeflow on Google Cloud Platform, AWS, and Azure
  • Use KFServing to develop and deploy machine learning models

Table of contents

  1. Preface
    1. What Is in This Book?
    2. Who Is This Book For?
    3. Conventions Used in This Book
    4. Using Code Examples
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgments
      1. Josh
      2. Michael
      3. Austin
  2. 1. Introduction to Kubeflow
    1. Machine Learning on Kubernetes
      1. The Evolution of Machine Learning in Enterprise
      2. It’s Harder Than Ever to Run Enterprise Infrastructure
      3. Identifying Next-Generation Infrastructure (NGI) Core Principles
      4. Kubernetes for Production Application Deployment
      5. Enter: Kubeflow
      6. What Problems Does Kubeflow Solve?
      7. Origin of Kubeflow
      8. Who Uses Kubeflow?
    2. Common Kubeflow Use Cases
      1. Running Notebooks on GPUs
      2. Shared Multitenant Machine Learning Environment
      3. Building a Transfer Learning Pipeline
      4. Deploying Models to Production for Application Integration
    3. Components of Kubeflow
      1. Machine Learning Tools
      2. Applications and Scaffolding
      3. Machine Learning Model Inference Serving with KFServing
      4. Platforms and Clouds
    4. Summary
  3. 2. Kubeflow Architecture and Best Practices
    1. Kubeflow Architecture Overview
      1. Kubeflow and Kubernetes
      2. Ways to Run a Job on Kubeflow
      3. Machine Learning Metadata Service
      4. Artifact Storage
      5. Istio Operations in Kubeflow
    2. Kubeflow Multitenancy Architecture
      1. Multitenancy and Isolation
      2. Multiuser Architecture
      3. Multiuser Authorization Flow
      4. Kubeflow Profiles
      5. Multiuser Isolation
    3. Notebook Architecture
      1. Notebook Server Launcher UI
      2. Notebook Controller
    4. Pipelines Architecture
    5. Kubeflow Best Practices
      1. Managing Job Dependencies
      2. Using GPUs
      3. Experiment Management
    6. Summary
  4. 3. Planning a Kubeflow Installation
    1. Security Planning
      1. Components That Extend the Kubernetes API
      2. Components Running Atop Kubernetes
      3. Background and Motivation
      4. Kubeflow and Deployed Applications
      5. Integration
    2. Users
      1. Profiling Users
      2. Varying Skillsets
    3. Workloads
      1. Cluster Utilization
      2. Data Patterns
    4. GPU Planning
      1. Planning for GPUs
      2. Models that Benefit from GPUs
    5. Infrastructure Planning
      1. Kubernetes Considerations
      2. On-Premise
      3. Cloud
      4. Placement
    6. Container Management
    7. Serverless Container Operations with Knative
    8. Sizing and Growing
      1. Forecasting
      2. Storage
      3. Scaling
    9. Summary
  5. 4. Installing Kubeflow On-Premise
    1. Kubernetes Operations from the Command Line
      1. Installing kubectl
      2. Using kubectl
      3. Using Docker
    2. Basic Install Process
      1. Installing On-Premise
        1. Considerations for Building Kubernetes Clusters
        2. Gateway Host Access to Kubernetes Cluster
        3. Active Directory Integration and User Management
        4. Kerberos Integration
        5. Storage Integration
        6. Container Management and Artifact Repositories
    3. Accessing and Interacting with Kubeflow
      1. Common Command-Line Operations
      2. Accessible Web UIs
    4. Installing Kubeflow
      1. System Requirements
      2. Set Up and Deploy
    5. Summary
  6. 5. Running Kubeflow on Google Cloud
    1. Overview of the Google Cloud Platform
      1. Storage
      2. Google Cloud Identity-Aware Proxy
      3. Google Cloud Security and the Cloud Identity-Aware Proxy
      4. GCP Projects for Application Deployments
      5. GCP Service Accounts
      6. Signing Up for Google Cloud Platform
    2. Installing the Google Cloud SDK
      1. Update Python
      2. Download and Install Google Cloud SDK
    3. Installing Kubeflow on Google Cloud Platform
      1. Create a Project in the GCP Console
      2. Enabling APIs for a Project
      3. Set Up OAuth for GCP Cloud IAP
      4. Deploy Kubeflow Using the Command-Line Interface
      5. Accessing the Kubeflow UI Post-Installation
    4. Summary
  7. 6. Running Kubeflow on Amazon Web Services
    1. Overview of Amazon Web Services
      1. Storage
      2. Amazon Storage Pricing
      3. Amazon Cloud Security
      4. AWS Compute Services
      5. Managed Kubernetes on EKS
    2. Signing Up for Amazon Web Services
    3. Installing the AWS CLI
      1. Update Python
      2. Install the AWS CLI
    4. Kubeflow on Amazon Web Services
      1. Installing kubectl
      2. Install the eksctl CLI for Amazon EKS
      3. Install AWS IAM Authenticator
      4. Install jq
    5. Using Managed Kubernetes on Amazon EKS
      1. Create an EKS Service Role
      2. Create an AWS VPC
      3. Creating EKS Clusters
      4. Deploying an EKS Cluster with eksctl
    6. Understanding the Deployment Process
      1. Kubeflow Configuration and Deployment
      2. Customize the Kubeflow Deployment
      3. Customize Authentication
      4. Resizing EKS Clusters
      5. Deleting EKS Clusters
      6. Adding Logging
      7. Troubleshooting Deployments
    7. Summary
  8. 7. Running Kubeflow on Azure
    1. Overview of the Azure Cloud Platform
      1. Key Azure Components
      2. Storage on Azure
      3. The Azure Security Model
      4. Service Accounts
      5. Resources and Resource Groups
      6. Azure Virtual Machines
      7. Containers and Managed Azure Kubernetes Services
    2. The Azure CLI
      1. Installing the Azure CLI
    3. Installing Kubeflow on Azure Kubernetes
      1. Azure Login and Configuration
      2. Create an AKS Cluster for Kubeflow
      3. Kubeflow Installation
    4. Authorizing Network Access to Deployment
    5. Summary
  9. 8. Model Serving and Integration
    1. Basic Concepts of Model Management
      1. Understanding Training Models Versus Model Inference
      2. Building an Intuition for Model Integration
      3. Scaling Model Inference Throughput
      4. Model Management
    2. Introduction to KFServing
      1. Advantages of Using KFServing
      2. Core Concepts in KFServing
      3. Supported Pre-Built Model Servers
      4. KFServing Security Model
    3. Managing Models with KFServing
      1. Installing KFServing on a Kubernetes Cluster
      2. Deploying a Model on KFServing
      3. Managing Model Traffic with Canarying
      4. Deploying a Custom Transformer
      5. Roll Back a Deployed Model
      6. Removing a Deployed Model
    4. Summary
  10. A. Infrastructure Concepts
    1. Public Key Infrastructure
    2. Authentication
      1. Kubeflow and Authentication
    3. Authorization
      1. Authorization and Role-Based Access Control
    4. Lightweight Directory Access Protocol
    5. Kerberos
    6. Transport Layer Security
    7. X.509 Cert
    8. Webhook
    9. Active Directory
    10. Identity Providers
    11. Identity-Aware Proxy (IAP)
      1. IAP and Google Cloud Platform
    12. OAuth
    13. OpenID Connect
    14. End-User Authentication with JWT
    15. Simple and Protected GSS_API Negotiation Mechanism
    16. Dex: A Federated OpenID Connect Provider
      1. Dex and Kerberos
    17. Service Accounts
    18. The Control Plane
      1. Options for Securing the Control Plane
  11. B. An Overview of Kubernetes
    1. Core Kubernetes Concepts
      1. Pod
      2. Object Spec and Status
      3. Describing a Kubernetes Object
      4. Submitting Containers to Kubernetes
      5. Kubernetes Resource Model
      6. Custom Resources, Controllers, and Operators
      7. Custom Controllers
      8. Custom Resource Definition
  12. C. Istio Operations and Kubeflow
    1. Service Mesh Management with Istio
      1. Istio Architecture
      2. Traffic Management
      3. Istio Security Architecture
      4. Istio Authorization and Role-Based Access Control
  13. Index

Product information

  • Title: Kubeflow Operations Guide
  • Author(s): Josh Patterson, Michael Katzenellenbogen, Austin Harris
  • Release date: December 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492053279