Companies are spending billions on machine learning projects, but it’s money wasted if the models can’t be deployed effectively. In this practical guide, Hannes Hapke and Catherine Nelson walk you through the steps of automating a machine learning pipeline using the TensorFlow ecosystem. You’ll learn the techniques and tools that will cut deployment time from days to minutes, so that you can focus on developing new models rather than maintaining legacy systems.
Data scientists, machine learning engineers, and DevOps engineers will discover how to go beyond model development to successfully productize their data science projects, while managers will better understand the role they play in helping to accelerate these projects. The book also explores new approaches for integrating data privacy into machine learning pipelines.
- Understand the machine learning management lifecycle
- Implement data pipelines with Apache Airflow and Kubeflow Pipelines
- Work with data using TensorFlow tools like ML Metadata, TensorFlow Data Validation, and TensorFlow Transform
- Analyze models with TensorFlow Model Analysis and ship them with the TFX Model Pusher Component after the ModelValidator TFX Component confirmed that the analysis results are an improvement
- Deploy models in a variety of environments with TensorFlow Serving, TensorFlow Lite, and TensorFlow.js
- Learn methods for adding privacy, including differential privacy with TensorFlow Privacy and federated learning with TensorFlow Federated
- Design model feedback loops to increase your data sets and learn when to update your machine learning models
Table of Contents
- What Are Machine Learning Pipelines?
- Who is this book for?
- Why Machine Learning Pipelines?
- When should you think about Machine Learning Pipelines?
- Overview of Machine Learning Pipelines
- Overview of the Chapters
- Our Example Project
2. Pipeline Orchestration
- Why Pipeline Orchestration
- Directed Acyclic Graphs
- Machine Learning Pipelines with Apache Beam
- Machine Learning Pipelines with Apache Airflow
- Machine Learning Pipelines with Kubeflow Pipeline
- Which Orchestration Tool to Choose?
3. Data Validation with TensorFlow
- Why Data Validation?
- TensorFlow Data Validation
- Recognizing problems in your data
- Processing large Data Sets with Google Cloud Platform
- Integrate TensorFlow Data Validation into your Machine Learning Pipeline
4. Model Deployment with TensorFlow Serving
- A Simple Model Server
- TensorFlow Serving
- TensorFlow Architecture Overview
- Exporting Models for TensorFlow Serving
- Model Signatures
- Inspecting Exported Models
- Setting up TensorFlow Serving
- Configure a TensorFlow Server
- REST vs gRPC
- Making predictions from the Model Server
- Model A/B Testing with TensorFlow Serving
- Requesting Model Meta Data from the Model Server
- Batching Inference Requests
- Other TensorFlow Serving Optimizations
- TensorFlow Serving Alternatives
- Deploying with Cloud Providers
5. Feedback Loops
- Introduction to feedback loops
- Design patterns for collecting feedback
- How to track feedback loops
6. Data Privacy for Machine Learning
- Introduction to Data Privacy
- Introduction to Differential Privacy
- Introduction to TensorFlow Privacy
- Introduction to Federated Learning
- Federated Learning frameworks
- Introduction to encrypted machine learning
- Other methods for data privacy
7. Appendix: Introduction to Infrastructure for Machine Learning
- What is a container?
- Introduction to Docker
- Introduction to Kubernetes
- Deploying applications to Kubernetes
- Title: Building Machine Learning Pipelines
- Release date: August 2020
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492053187