Machine Learning on Kubernetes

Book description

Build a Kubernetes-based self-serving, agile data science and machine learning ecosystem for your organization using reliable and secure open source technologies

Key Features

  • Build a complete machine learning platform on Kubernetes
  • Improve the agility and velocity of your team by adopting the self-service capabilities of the platform
  • Reduce time-to-market by automating data pipelines and model training and deployment

Book Description

MLOps is an emerging field that aims to bring repeatability, automation, and standardization of the software engineering domain to data science and machine learning engineering. By implementing MLOps with Kubernetes, data scientists, IT professionals, and data engineers can collaborate and build machine learning solutions that deliver business value for their organization.

You'll begin by understanding the different components of a machine learning project. Then, you'll design and build a practical end-to-end machine learning project using open source software. As you progress, you'll understand the basics of MLOps and the value it can bring to machine learning projects. You will also gain experience in building, configuring, and using an open source, containerized machine learning platform. In later chapters, you will prepare data, build and deploy machine learning models, and automate workflow tasks using the same platform. Finally, the exercises in this book will help you get hands-on experience in Kubernetes and open source tools, such as JupyterHub, MLflow, and Airflow.

By the end of this book, you'll have learned how to effectively build, train, and deploy a machine learning model using the machine learning platform you built.

What you will learn

  • Understand the different stages of a machine learning project
  • Use open source software to build a machine learning platform on Kubernetes
  • Implement a complete ML project using the machine learning platform presented in this book
  • Improve on your organization's collaborative journey toward machine learning
  • Discover how to use the platform as a data engineer, ML engineer, or data scientist
  • Find out how to apply machine learning to solve real business problems

Who this book is for

This book is for data scientists, data engineers, IT platform owners, AI product owners, and data architects who want to build their own platform for ML development. Although this book starts with the basics, a solid understanding of Python and Kubernetes, along with knowledge of the basic concepts of data science and data engineering will help you grasp the topics covered in this book in a better way.

Table of contents

  1. Machine Learning on Kubernetes
  2. Contributors
  3. About the authors
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
    9. Share Your Thoughts
  6. Part 1: The Challenges of Adopting ML and Understanding MLOps (What and Why)
  7. Chapter 1: Challenges in Machine Learning
    1. Understanding ML
    2. Delivering ML value
    3. Choosing the right approach
      1. The importance of data
    4. Facing the challenges of adopting ML
      1. Focusing on the big picture
      2. Breaking down silos
      3. Fail-fast culture
    5. An overview of the ML platform
    6. Summary
    7. Further reading
  8. Chapter 2: Understanding MLOps
    1. Comparing ML to traditional programming
    2. Exploring the benefits of DevOps
    3. Understanding MLOps
      1. ML
      2. DevOps
      3. ML project life cycle
      4. Fast feedback loop
      5. Collaborating over the project life cycle
    4. The role of OSS in ML projects
    5. Running ML projects on Kubernetes
    6. Summary
    7. Further reading
  9. Chapter 3: Exploring Kubernetes
    1. Technical requirements
    2. Exploring Kubernetes major components
      1. Control plane
      2. Worker nodes
      3. Kubernetes objects required to run an application
    3. Becoming cloud-agnostic through Kubernetes
    4. Understanding Operators
    5. Setting up your local Kubernetes environment
      1. Installing kubectl
      2. Installing minikube
      3. Installing OLM
    6. Provisioning a VM on GCP
    7. Summary
  10. Part 2: The Building Blocks of an MLOps Platform and How to Build One on Kubernetes
  11. Chapter 4: The Anatomy of a Machine Learning Platform
    1. Technical requirements
    2. Defining a self-service platform
    3. Exploring the data engineering components
      1. Data engineer workflow
    4. Exploring the model development components
      1. Understanding the data scientist workflow
    5. Security, monitoring, and automation
    6. Introducing ODH
      1. Installing the ODH operator on Kubernetes
      2. Enabling the ingress controller on the Kubernetes cluster
      3. Installing Keycloak on Kubernetes
    7. Summary
    8. Further reading
  12. Chapter 5: Data Engineering
    1. Technical requirements
    2. Configuring Keycloak for authentication
      1. Importing the Keycloak configuration for the ODH components
      2. Creating a Keycloak user
    3. Configuring ODH components
      1. Installing ODH
    4. Understanding and using JupyterHub
      1. Validating the JupyterHub installation
      2. Running your first Jupyter notebook
    5. Understanding the basics of Apache Spark
      1. Understanding Apache Spark job execution
    6. Understanding how ODH provisions Apache Spark cluster on-demand
      1. Creating a Spark cluster
      2. Understanding how JupyterHub creates a Spark cluster
    7. Writing and running a Spark application from Jupyter Notebook
    8. Summary
  13. Chapter 6: Machine Learning Engineering
    1. Technical requirements
    2. Understanding ML engineering
    3. Using a custom notebook image
      1. Building a custom notebook container image
    4. Introducing MLflow
      1. Understanding MLflow components
      2. Validating the MLflow installation
    5. Using MLFlow as an experiment tracking system
      1. Adding custom data to the experiment run
    6. Using MLFlow as a model registry system
    7. Summary
  14. Chapter 7: Model Deployment and Automation
    1. Technical requirements
    2. Understanding model inferencing with Seldon Core
      1. Wrapping the model using Python
      2. Containerizing the model
      3. Deploying the model using the Seldon controller
    3. Packaging, running, and monitoring a model using Seldon Core
    4. Introducing Apache Airflow
      1. Understanding DAG
      2. Exploring Airflow features
      3. Understanding Airflow components
      4. Validating the Airflow installation
      5. Configuring the Airflow DAG repository
      6. Configuring Airflow runtime images
    5. Automating ML model deployments in Airflow
      1. Creating the pipeline by using the pipeline editor
    6. Summary
  15. Part 3: How to Use the MLOps Platform and Build a Full End-to-End Project Using the New Platform
  16. Chapter 8: Building a Complete ML Project Using the Platform
    1. Reviewing the complete picture of the ML platform
    2. Understanding the business problem
    3. Data collection, processing, and cleaning
      1. Understanding data sources, location, and the format
      2. Understanding data processing and cleaning
    4. Performing exploratory data analysis
      1. Understanding sample data
    5. Understanding feature engineering
      1. Data augmentation
    6. Building and evaluating the ML model
      1. Selecting evaluation criteria
      2. Building the model
      3. Deploying the model
    7. Reproducibility
    8. Summary
  17. Chapter 9: Building Your Data Pipeline
    1. Technical requirements
    2. Automated provisioning of a Spark cluster for development
    3. Writing a Spark data pipeline
      1. Preparing the environment
      2. Understanding data
      3. Designing and building the pipeline
      4. Using the Spark UI to monitor your data pipeline
    4. Building and executing a data pipeline using Airflow
      1. Understanding the data pipeline DAG
      2. Building and running the DAG
    5. Summary
  18. Chapter 10: Building, Deploying, and Monitoring Your Model
    1. Technical requirements
    2. Visualizing and exploring data using JupyterHub
    3. Building and tuning your model using JupyterHub
    4. Tracking model experiments and versioning using MLflow
      1. Tracking model experiments
      2. Versioning models
    5. Deploying the model as a service
      1. Calling your model
    6. Monitoring your model
      1. Understanding monitoring components
      2. Configuring Grafana and a dashboard
    7. Summary
  19. Chapter 11: Machine Learning on Kubernetes
    1. Identifying ML platform use cases
      1. Considering AutoML
      2. Commercial platforms
      3. ODH
    2. Operationalizing ML
      1. Setting the business expectations
      2. Dealing with dirty real-world data
      3. Dealing with incorrect results
      4. Maintaining continuous delivery
      5. Managing security
      6. Adhering to compliance policies
      7. Applying governance
    3. Running on Kubernetes
      1. Avoiding vendor lock-ins
      2. Considering other Kubernetes platforms
    4. Roadmap
    5. Summary
    6. Further reading
    7. Why subscribe?
  20. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Machine Learning on Kubernetes
  • Author(s): Faisal Masood, Ross Brigoli
  • Release date: June 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803241807