Video description
In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.
Practical patterns for scaling machine learning from your laptop to a distributed cluster.
Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. This book reveals best practice techniques and insider tips for tackling the challenges of scaling machine learning systems.
In Distributed Machine Learning Patterns you will learn how to:
- Apply distributed systems patterns to build scalable and reliable machine learning projects
- Build ML pipelines with data ingestion, distributed training, model serving, and more
- Automate ML tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
- Make trade-offs between different patterns and approaches
- Manage and monitor machine learning workloads at scale
Inside Distributed Machine Learning Patterns you’ll learn to apply established distributed systems patterns to machine learning projects—plus explore cutting-edge new patterns created specifically for machine learning. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Hands-on projects and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines.
About the Technology
Deploying a machine learning application on a modern distributed system puts the spotlight on reliability, performance, security, and other operational concerns. In this in-depth guide, Yuan Tang, project lead of Argo and Kubeflow, shares patterns, examples, and hard-won insights on taking an ML model from a single device to a distributed cluster.
About the Book
Distributed Machine Learning Patterns provides dozens of techniques for designing and deploying distributed machine learning systems. In it, you’ll learn patterns for distributed model training, managing unexpected failures, and dynamic model serving. You’ll appreciate the practical examples that accompany each pattern along with a full-scale project that implements distributed model training and inference with autoscaling on Kubernetes.
What's Inside
- Data ingestion, distributed training, model serving, and more
- Automating Kubernetes and TensorFlow with Kubeflow and Argo Workflows
- Manage and monitor workloads at scale
About the Reader
For data analysts and engineers familiar with the basics of machine learning, Bash, Python, and Docker.
About the Author
Yuan Tang is a project lead of Argo and Kubeflow, maintainer of TensorFlow and XGBoost, and author of numerous open source projects.
Quotes
Approachable for beginners and inspirational for experienced practitioners. As soon as I finished reading, I was ready to start building.
- James Lamb, SpotHero
Exceptionally timely and comprehensive. Its pattern perspective, accompanied by real-world examples and widely adopted systems like Kubernetes, Kubeflow, and Argo, truly set it apart.
- Yuan Chen, Apple
An amazing guide to designing resilient and scalable ML systems for both training and serving models.
- Ryan Russon, Capital One
A wonderful book! Machine learning at scale explained clearly and from first principles!
- Laurence Moroney, Google
Table of contents
- Part 1. Basic concepts and background
- Chapter 1. Introduction to distributed machine learning systems
- Chapter 1. Distributed systems
- Chapter 1. Distributed machine learning systems
- Chapter 1. What we will learn in this book
- Chapter 1. Summary
- Part 2. Patterns of distributed machine learning systems
- Chapter 2. Data ingestion patterns
- Chapter 2. The Fashion-MNIST dataset
- Chapter 2. Batching pattern
- Chapter 2. Sharding pattern: Splitting extremely large datasets among multiple machines
- Chapter 2. Caching pattern
- Chapter 2. Answers to exercises
- Chapter 2. Summary
- Chapter 3. Distributed training patterns
- Chapter 3. Parameter server pattern: Tagging entities in 8 million YouTube videos
- Chapter 3. Collective communication pattern
- Chapter 3. Elasticity and fault-tolerance pattern
- Chapter 3. Answers to exercises
- Chapter 3. Summary
- Chapter 4. Model serving patterns
- Chapter 4. Replicated services pattern: Handling the growing number of serving requests
- Chapter 4. Sharded services pattern
- Chapter 4. The event-driven processing pattern
- Chapter 4. Answers to exercises
- Chapter 4. Summary
- Chapter 5. Workflow patterns
- Chapter 5. Fan-in and fan-out patterns: Composing complex machine learning workflows
- Chapter 5. Synchronous and asynchronous patterns: Accelerating workflows with concurrency
- Chapter 5. Step memoization pattern: Skipping redundant workloads via memoized steps
- Chapter 5. Answers to exercises
- Chapter 5. Summary
- Chapter 6. Operation patterns
- Chapter 6. Scheduling patterns: Assigning resources effectively in a shared cluster
- Chapter 6. Metadata pattern: Handle failures appropriately to minimize the negative effect on users
- Chapter 6. Answers to exercises
- Chapter 6. Summary
- Part 3. Building a distributed machine learning workflow
- Chapter 7. Project overview and system architecture
- Chapter 7. Data ingestion
- Chapter 7. Model training
- Chapter 7. Model serving
- Chapter 7. End-to-end workflow
- Chapter 7. Answers to exercises
- Chapter 7. Summary
- Chapter 8. Overview of relevant technologies
- Chapter 8. Kubernetes: The distributed container orchestration system
- Chapter 8. Kubeflow: Machine learning workloads on Kubernetes
- Chapter 8. Argo Workflows: Container-native workflow engine
- Chapter 8. Answers to exercises
- Chapter 8. Summary
- Chapter 9. A complete implementation
- Chapter 9. Model training
- Chapter 9. Model serving
- Chapter 9. The end-to-end workflow
- Chapter 9. Summary
Product information
- Title: Distributed Machine Learning Patterns, Video Edition
- Author(s):
- Release date: January 2024
- Publisher(s): Manning Publications
- ISBN: None
You might also like
book
Distributed Machine Learning Patterns
Practical patterns for scaling machine learning from your laptop to a distributed cluster. Distributing machine learning …
video
What Developers Need to Know to Design Machine Learning Systems
What makes a system-support machine learning model development? Find out how to transit yourself into the …
article
Run Llama-2 Models
Llama is Meta’s answer to the growing demand for LLMs. Unlike its well-known technological relative, ChatGPT, …
video
O'Reilly Book Club: Chip Huyen on Designing Machine Learning Systems–Reliable, Scalable, and Adaptive Machine Learning Systems
Join us for this edition of O’Reilly Book Club with Chip Huyen, author of Designing Machine …