Building Machine Learning Pipelines

Book Description

Companies are spending billions on machine learning projects, but it’s money wasted if the models can’t be deployed effectively. In this practical guide, Hannes Hapke and Catherine Nelson walk you through the steps of automating a machine learning pipeline using the TensorFlow ecosystem. You’ll learn the techniques and tools that will cut deployment time from days to minutes, so that you can focus on developing new models rather than maintaining legacy systems.

Data scientists, machine learning engineers, and DevOps engineers will discover how to go beyond model development to successfully productize their data science projects, while managers will better understand the role they play in helping to accelerate these projects. The book also explores new approaches for integrating data privacy into machine learning pipelines.

  • Understand the machine learning management lifecycle
  • Implement data pipelines with Apache Airflow and Kubeflow Pipelines
  • Work with data using TensorFlow tools like ML Metadata, TensorFlow Data Validation, and TensorFlow Transform
  • Analyze models with TensorFlow Model Analysis and ship them with the TFX Model Pusher Component after the ModelValidator TFX Component confirmed that the analysis results are an improvement
  • Deploy models in a variety of environments with TensorFlow Serving, TensorFlow Lite, and TensorFlow.js
  • Learn methods for adding privacy, including differential privacy with TensorFlow Privacy and federated learning with TensorFlow Federated
  • Design model feedback loops to increase your data sets and learn when to update your machine learning models

Table of Contents

  1. 1. Introduction
    1. What Are Machine Learning Pipelines?
    2. Who is this book for?
    3. Why Machine Learning Pipelines?
    4. When should you think about Machine Learning Pipelines?
    5. Overview of Machine Learning Pipelines
      1. Data Ingestion and Data Versioning
      2. Data Validation
      3. Data Preprocessing
      4. Model Training and Tuning
      5. Model Analysis
      6. Model Versioning
      7. Model Deployment
      8. Feedback Loops
      9. Data Privacy
    6. Overview of the Chapters
    7. Our Example Project
      1. Project Structure
      2. Downloading the Dataset
      3. Our Machine Learning Model
      4. Goal of the Example Project
    8. Summary
  2. 2. Pipeline Orchestration
    1. Why Pipeline Orchestration
    2. Directed Acyclic Graphs
    3. Machine Learning Pipelines with Apache Beam
      1. Setup
      2. Basic Pipeline
      3. Executing your Basic Pipeline
      4. Orchestrating TensorFlow Extended Pipelines with Apache Beam
    4. Machine Learning Pipelines with Apache Airflow
      1. Setup
      2. Basic Pipeline
      3. Orchestrating TensorFlow Extended Pipelines with Apache Airflow
    5. Machine Learning Pipelines with Kubeflow Pipeline
      1. Installation & Setup
      2. Orchestrating TensorFlow Extended Pipelines with Kubeflow Pipelines
    6. Which Orchestration Tool to Choose?
    7. Summary
  3. 3. Data Validation with TensorFlow
    1. Why Data Validation?
    2. TensorFlow Data Validation
      1. Installation
      2. Generating Statistics from your Data
      3. Generating Schema from your Data
    3. Recognizing problems in your data
      1. Comparing Data Sets
      2. Updating the schema
      3. Data skew and drift
      4. Biased datasets
      5. Slicing data in TFDV
    4. Processing large Data Sets with Google Cloud Platform
    5. Integrate TensorFlow Data Validation into your Machine Learning Pipeline
    6. Summary
  4. 4. Model Deployment with TensorFlow Serving
    1. A Simple Model Server
      1. Why it isn’t Recommended
    2. TensorFlow Serving
    3. TensorFlow Architecture Overview
    4. Exporting Models for TensorFlow Serving
    5. Model Signatures
    6. Inspecting Exported Models
      1. Inspecting the Model
      2. Testing the Model
    7. Setting up TensorFlow Serving
      1. Docker Installation
      2. Native Ubuntu Installation
      3. Building TensorFlow Serving from Source
    8. Configure a TensorFlow Server
      1. Single Model Configuration
      2. Multi Model Configuration
    9. REST vs gRPC
      1. Representational State Transfer
      2. Google Remote Procedures Calls
    10. Making predictions from the Model Server
      1. Getting model predictions via REST
      2. Using TensorFlow Serving via gRPC
    11. Model A/B Testing with TensorFlow Serving
    12. Requesting Model Meta Data from the Model Server
      1. REST Requests for Model Meta Data
      2. gRPC Requests for Model Meta Data
    13. Batching Inference Requests
      1. Configure Batch Predictions
    14. Other TensorFlow Serving Optimizations
    15. TensorFlow Serving Alternatives
      1. Seldon
      2. GraphPipe
      3. Simple TensorFlow Serving
      4. MLflow
    16. Deploying with Cloud Providers
      1. Use Cases
      2. Example Deployment with Google Cloud Platforms
    17. Summary
  5. 5. Feedback Loops
    1. Introduction to feedback loops
      1. Explicit and implicit feedback
      2. The data flywheel
      3. Feedback loops in the real world
    2. Design patterns for collecting feedback
      1. Users take some action as a result of the prediction
      2. Users rate the quality of the prediction
      3. Users correct the prediction
      4. Crowdsource the annotations
      5. Expert annotations
      6. Feedback is produced automatically by the system
    3. How to track feedback loops
      1. Tracking explicit feedback
      2. Tracking implicit feedback
    4. Summary
  6. 6. Data Privacy for Machine Learning
    1. Introduction to Data Privacy
      1. Why do we care about data privacy?
      2. The simplest way to increase privacy
      3. What data needs to be kept private?
    2. Introduction to Differential Privacy
      1. Local and global differential privacy
      2. Epsilon, delta and the privacy budget
      3. Differential privacy for machine learning
    3. Introduction to TensorFlow Privacy
      1. Training with a differentially private optimizer
      2. Calculating epsilon
    4. Introduction to Federated Learning
    5. Federated Learning frameworks
    6. Introduction to encrypted machine learning
      1. Encrypted model training
      2. Converting a trained model to serve encrypted predictions
    7. Other methods for data privacy
    8. Summary
  7. 7. Appendix: Introduction to Infrastructure for Machine Learning
    1. What is a container?
    2. Introduction to Docker
      1. Introduction to Docker images
      2. Building your first Docker image
      3. Dive into the Docker CLI
    3. Introduction to Kubernetes
      1. Some Kubernetes definitions
      2. Getting started with Minikube and kubectl
      3. Interacting with the Kubernetes CLI
      4. Defining a Kubernetes resource
    4. Deploying applications to Kubernetes

Product Information

  • Title: Building Machine Learning Pipelines
  • Author(s): Catherine Nelson, Hannes Hapke
  • Release date: August 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492053187