Distributed Machine Learning Patterns, Video Edition

Video description

In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.

Practical patterns for scaling machine learning from your laptop to a distributed cluster.

Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. This book reveals best practice techniques and insider tips for tackling the challenges of scaling machine learning systems.

In Distributed Machine Learning Patterns you will learn how to:

  • Apply distributed systems patterns to build scalable and reliable machine learning projects
  • Build ML pipelines with data ingestion, distributed training, model serving, and more
  • Automate ML tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
  • Make trade-offs between different patterns and approaches
  • Manage and monitor machine learning workloads at scale

Inside Distributed Machine Learning Patterns you’ll learn to apply established distributed systems patterns to machine learning projects—plus explore cutting-edge new patterns created specifically for machine learning. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Hands-on projects and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines.

About the Technology
Deploying a machine learning application on a modern distributed system puts the spotlight on reliability, performance, security, and other operational concerns. In this in-depth guide, Yuan Tang, project lead of Argo and Kubeflow, shares patterns, examples, and hard-won insights on taking an ML model from a single device to a distributed cluster.

About the Book
Distributed Machine Learning Patterns provides dozens of techniques for designing and deploying distributed machine learning systems. In it, you’ll learn patterns for distributed model training, managing unexpected failures, and dynamic model serving. You’ll appreciate the practical examples that accompany each pattern along with a full-scale project that implements distributed model training and inference with autoscaling on Kubernetes.

What's Inside
  • Data ingestion, distributed training, model serving, and more
  • Automating Kubernetes and TensorFlow with Kubeflow and Argo Workflows
  • Manage and monitor workloads at scale


About the Reader
For data analysts and engineers familiar with the basics of machine learning, Bash, Python, and Docker.

About the Author
Yuan Tang is a project lead of Argo and Kubeflow, maintainer of TensorFlow and XGBoost, and author of numerous open source projects.

Quotes
Approachable for beginners and inspirational for experienced practitioners. As soon as I finished reading, I was ready to start building.
- James Lamb, SpotHero

Exceptionally timely and comprehensive. Its pattern perspective, accompanied by real-world examples and widely adopted systems like Kubernetes, Kubeflow, and Argo, truly set it apart.
- Yuan Chen, Apple

An amazing guide to designing resilient and scalable ML systems for both training and serving models.
- Ryan Russon, Capital One

A wonderful book! Machine learning at scale explained clearly and from first principles!
- Laurence Moroney, Google

Table of contents

  1. Part 1. Basic concepts and background
  2. Chapter 1. Introduction to distributed machine learning systems
  3. Chapter 1. Distributed systems
  4. Chapter 1. Distributed machine learning systems
  5. Chapter 1. What we will learn in this book
  6. Chapter 1. Summary
  7. Part 2. Patterns of distributed machine learning systems
  8. Chapter 2. Data ingestion patterns
  9. Chapter 2. The Fashion-MNIST dataset
  10. Chapter 2. Batching pattern
  11. Chapter 2. Sharding pattern: Splitting extremely large datasets among multiple machines
  12. Chapter 2. Caching pattern
  13. Chapter 2. Answers to exercises
  14. Chapter 2. Summary
  15. Chapter 3. Distributed training patterns
  16. Chapter 3. Parameter server pattern: Tagging entities in 8 million YouTube videos
  17. Chapter 3. Collective communication pattern
  18. Chapter 3. Elasticity and fault-tolerance pattern
  19. Chapter 3. Answers to exercises
  20. Chapter 3. Summary
  21. Chapter 4. Model serving patterns
  22. Chapter 4. Replicated services pattern: Handling the growing number of serving requests
  23. Chapter 4. Sharded services pattern
  24. Chapter 4. The event-driven processing pattern
  25. Chapter 4. Answers to exercises
  26. Chapter 4. Summary
  27. Chapter 5. Workflow patterns
  28. Chapter 5. Fan-in and fan-out patterns: Composing complex machine learning workflows
  29. Chapter 5. Synchronous and asynchronous patterns: Accelerating workflows with concurrency
  30. Chapter 5. Step memoization pattern: Skipping redundant workloads via memoized steps
  31. Chapter 5. Answers to exercises
  32. Chapter 5. Summary
  33. Chapter 6. Operation patterns
  34. Chapter 6. Scheduling patterns: Assigning resources effectively in a shared cluster
  35. Chapter 6. Metadata pattern: Handle failures appropriately to minimize the negative effect on users
  36. Chapter 6. Answers to exercises
  37. Chapter 6. Summary
  38. Part 3. Building a distributed machine learning workflow
  39. Chapter 7. Project overview and system architecture
  40. Chapter 7. Data ingestion
  41. Chapter 7. Model training
  42. Chapter 7. Model serving
  43. Chapter 7. End-to-end workflow
  44. Chapter 7. Answers to exercises
  45. Chapter 7. Summary
  46. Chapter 8. Overview of relevant technologies
  47. Chapter 8. Kubernetes: The distributed container orchestration system
  48. Chapter 8. Kubeflow: Machine learning workloads on Kubernetes
  49. Chapter 8. Argo Workflows: Container-native workflow engine
  50. Chapter 8. Answers to exercises
  51. Chapter 8. Summary
  52. Chapter 9. A complete implementation
  53. Chapter 9. Model training
  54. Chapter 9. Model serving
  55. Chapter 9. The end-to-end workflow
  56. Chapter 9. Summary

Product information

  • Title: Distributed Machine Learning Patterns, Video Edition
  • Author(s): Yuan Tang
  • Release date: January 2024
  • Publisher(s): Manning Publications
  • ISBN: None