book

Designing Machine Learning Systems

Name: Designing Machine Learning Systems
Author: Chip Huyen
ISBN: 9781098107963

by Chip Huyen

May 2022

Intermediate to advanced

386 pages

12h 25m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who This Book Is ForWhat This Book Is NotNavigating This BookGitHub Repository and CommunityConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Overview of Machine Learning Systems
When to Use Machine LearningMachine Learning Use CasesUnderstanding Machine Learning SystemsMachine Learning in Research Versus in ProductionMachine Learning Systems Versus Traditional SoftwareSummary
2. Introduction to Machine Learning Systems Design
Business and ML ObjectivesRequirements for ML SystemsReliabilityScalabilityMaintainabilityAdaptabilityIterative ProcessFraming ML ProblemsTypes of ML TasksObjective FunctionsMind Versus DataSummary
3. Data Engineering Fundamentals
Data SourcesData FormatsJSONRow-Major Versus Column-Major FormatText Versus Binary FormatData ModelsRelational ModelNoSQLStructured Versus Unstructured DataData Storage Engines and ProcessingTransactional and Analytical ProcessingETL: Extract, Transform, and LoadModes of DataflowData Passing Through DatabasesData Passing Through ServicesData Passing Through Real-Time TransportBatch Processing Versus Stream ProcessingSummary
4. Training Data
SamplingNonprobability SamplingSimple Random SamplingStratified SamplingWeighted SamplingReservoir SamplingImportance SamplingLabelingHand LabelsNatural LabelsHandling the Lack of LabelsClass ImbalanceChallenges of Class ImbalanceHandling Class ImbalanceData AugmentationSimple Label-Preserving TransformationsPerturbationData SynthesisSummary
5. Feature Engineering
Learned Features Versus Engineered FeaturesCommon Feature Engineering OperationsHandling Missing ValuesScalingDiscretizationEncoding Categorical FeaturesFeature CrossingDiscrete and Continuous Positional EmbeddingsData LeakageCommon Causes for Data LeakageDetecting Data LeakageEngineering Good FeaturesFeature ImportanceFeature GeneralizationSummary
6. Model Development and Offline Evaluation
Model Development and TrainingEvaluating ML ModelsEnsemblesExperiment Tracking and VersioningDistributed TrainingAutoMLModel Offline EvaluationBaselinesEvaluation MethodsSummary
7. Model Deployment and Prediction Service
Machine Learning Deployment MythsMyth 1: You Only Deploy One or Two ML Models at a TimeMyth 2: If We Don’t Do Anything, Model Performance Remains the SameMyth 3: You Won’t Need to Update Your Models as MuchMyth 4: Most ML Engineers Don’t Need to Worry About ScaleBatch Prediction Versus Online PredictionFrom Batch Prediction to Online PredictionUnifying Batch Pipeline and Streaming PipelineModel CompressionLow-Rank FactorizationKnowledge DistillationPruningQuantizationML on the Cloud and on the EdgeCompiling and Optimizing Models for Edge DevicesML in BrowsersSummary
8. Data Distribution Shifts and Monitoring
Causes of ML System FailuresSoftware System FailuresML-Specific FailuresData Distribution ShiftsTypes of Data Distribution ShiftsGeneral Data Distribution ShiftsDetecting Data Distribution ShiftsAddressing Data Distribution ShiftsMonitoring and ObservabilityML-Specific MetricsMonitoring ToolboxObservabilitySummary
9. Continual Learning and Test in Production
Continual LearningStateless Retraining Versus Stateful TrainingWhy Continual Learning?Continual Learning ChallengesFour Stages of Continual LearningHow Often to Update Your ModelsTest in ProductionShadow DeploymentA/B TestingCanary ReleaseInterleaving ExperimentsBanditsSummary

10. Infrastructure and Tooling for MLOps
Storage and ComputePublic Cloud Versus Private Data CentersDevelopment EnvironmentDev Environment SetupStandardizing Dev EnvironmentsFrom Dev to Prod: ContainersResource ManagementCron, Schedulers, and OrchestratorsData Science Workflow ManagementML PlatformModel DeploymentModel StoreFeature StoreBuild Versus BuySummary
11. The Human Side of Machine Learning
User ExperienceEnsuring User Experience ConsistencyCombatting “Mostly Correct” PredictionsSmooth FailingTeam StructureCross-functional Teams CollaborationEnd-to-End Data ScientistsResponsible AIIrresponsible AI: Case StudiesA Framework for Responsible AISummary
Epilogue
Index
About the Author

Content preview from Designing Machine Learning Systems

Preface

Ever since the first machine learning course I taught at Stanford in 2017, many people have asked me for advice on how to deploy ML models at their organizations. These questions can be generic, such as “What model should I use?” “How often should I retrain my model?” “How can I detect data distribution shifts?” “How do I ensure that the features used during training are consistent with the features used during inference?”

These questions can also be specific, such as “I’m convinced that switching from batch prediction to online prediction will give our model a performance boost, but how do I convince my manager to let me do so?” or “I’m the most senior data scientist at my company and I’ve recently been tasked with setting up our first machine learning platform; where do I start?”

My short answer to all these questions is always: “It depends.” My long answers often involve hours of discussion to understand where the questioner comes from, what they’re actually trying to achieve, and the pros and cons of different approaches for their specific use case.

ML systems are both complex and unique. They are complex because they consist of many different components (ML algorithms, data, business logic, evaluation metrics, underlying infrastructure, etc.) and involve many different stakeholders (data scientists, ML engineers, business leaders, users, even society at large). ML systems are unique because they are data dependent, and data varies wildly from one use case to the next. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098107956Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Designing Machine Learning Systems

by Chip Huyen

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.