book

Machine Learning Design Patterns

by Valliappa Lakshmanan, Sara Robinson, Michael Munn

October 2020

Intermediate to advanced

405 pages

11h 53m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who Is This Book For?What’s Not in the BookCode SamplesConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
1. The Need for Machine Learning Design Patterns
What Are Design Patterns?How to Use This BookMachine Learning TerminologyModels and FrameworksData and Feature EngineeringThe Machine Learning ProcessData and Model ToolingRolesCommon Challenges in Machine LearningData QualityReproducibilityData DriftScaleMultiple ObjectivesSummary
2. Data Representation Design Patterns
Simple Data RepresentationsNumerical InputsCategorical InputsDesign Pattern 1: Hashed FeatureProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 2: EmbeddingsProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 3: Feature CrossProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 4: Multimodal InputProblemSolutionTrade-Offs and AlternativesSummary
3. Problem Representation Design Patterns
Design Pattern 5: Reframing ProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 6: Multilabel ProblemSolutionTrade-Offs and AlternativesDesign Pattern 7: EnsemblesProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 8: Cascade ProblemSolutionTrade-Offs and AlternativesDesign Pattern 9: Neutral Class ProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 10: Rebalancing ProblemSolutionTrade-Offs and AlternativesSummary
4. Model Training Patterns
Typical Training LoopStochastic Gradient DescentKeras Training LoopTraining Design PatternsDesign Pattern 11: Useful OverfittingProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 12: CheckpointsProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 13: Transfer LearningProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 14: Distribution StrategyProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 15: Hyperparameter TuningProblemSolutionWhy It WorksTrade-Offs and AlternativesSummary
5. Design Patterns for Resilient Serving
Design Pattern 16: Stateless Serving FunctionProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 17: Batch ServingProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 18: Continued Model EvaluationProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 19: Two-Phase PredictionsProblemSolutionTrade-Offs and AlternativesDesign Pattern 20: Keyed PredictionsProblemSolutionTrade-Offs and AlternativesSummary
6. Reproducibility Design Patterns
Design Pattern 21: TransformProblemSolutionTrade-Offs and AlternativesDesign Pattern 22: Repeatable SplittingProblemSolutionTrade-Offs and AlternativesDesign Pattern 23: Bridged SchemaProblemSolutionTrade-Offs and AlternativesDesign Pattern 24: Windowed InferenceProblemSolutionTrade-Offs and AlternativesDesign Pattern 25: Workflow PipelineProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 26: Feature StoreProblemSolutionWhy It WorksTrade-Offs and AlternativesDesign Pattern 27: Model VersioningProblemSolutionTrade-Offs and AlternativesSummary
7. Responsible AI
Design Pattern 28: Heuristic BenchmarkProblemSolutionTrade-Offs and AlternativesDesign Pattern 29: Explainable PredictionsProblemSolutionTrade-Offs and AlternativesDesign Pattern 30: Fairness LensProblemSolutionTrade-Offs and AlternativesSummary
8. Connected Patterns
Patterns ReferencePattern InteractionsPatterns Within ML ProjectsML Life CycleAI ReadinessCommon Patterns by Use Case and Data TypeNatural Language UnderstandingComputer VisionPredictive AnalyticsRecommendation SystemsFraud and Anomaly Detection
Index

Content preview from Machine Learning Design Patterns

Chapter 6. Reproducibility Design Patterns

Software best practices such as unit testing assume that if we run a piece of code, it produces deterministic output:

def sigmoid(x):
    return 1.0 / (1 + np.exp(-x))
    
class TestSigmoid(unittest.TestCase):
    def test_zero(self):
        self.assertAlmostEqual(sigmoid(0), 0.5)

    def test_neginf(self):
        self.assertAlmostEqual(sigmoid(float("-inf")), 0)
        
    def test_inf(self):
        self.assertAlmostEqual(sigmoid(float("inf")), 1)

This sort of reproducibility is difficult in machine learning. During training, machine learning models are initialized with random values and then adjusted based on training data. A simple k-means algorithm implemented by scikit-learn requires setting the random_state in order to ensure the algorithm returns the same results each time:

def cluster_kmeans(X):
    from sklearn import cluster
    k_means = cluster.KMeans(n_clusters=10, random_state=10)
    labels = k_means.fit(X).labels_[::]
    return labels

Beyond the random seed, there are many other artifacts that need to be fixed in order to ensure reproducibility during training. In addition, machine learning consists of different stages, such as training, deployment, and retraining. It is often important that some things are reproducible across these stages as well.

In this chapter, we’ll look at design patterns that address different aspects of reproducibility. The Transform design pattern captures data preparation dependencies from the model training pipeline to reproduce them during serving. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098115777Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning Design Patterns

by Valliappa Lakshmanan, Sara Robinson, Michael Munn

Chapter 6. Reproducibility Design Patterns

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.