Building Machine Learning Powered Applications

Book description

Learn the skills necessary to design, build, and deploy applications powered by machine learning (ML). Through the course of this hands-on book, you’ll build an example ML-driven application from initial idea to deployed product. Data scientists, software engineers, and product managers—including experienced practitioners and novices alike—will learn the tools, best practices, and challenges involved in building a real-world ML application step by step.

Author Emmanuel Ameisen, an experienced data scientist who led an AI education program, demonstrates practical ML concepts using code snippets, illustrations, screenshots, and interviews with industry leaders. Part I teaches you how to plan an ML application and measure success. Part II explains how to build a working ML model. Part III demonstrates ways to improve the model until it fulfills your original vision. Part IV covers deployment and monitoring strategies.

This book will help you:

  • Define your product goal and set up a machine learning problem
  • Build your first end-to-end pipeline quickly and acquire an initial dataset
  • Train and evaluate your ML models and address performance bottlenecks
  • Deploy and monitor your models in a production environment

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. The Goal of Using Machine Learning Powered Applications
      1. Use ML to Build Practical Applications
      2. Additional Resources
    2. Practical ML
      1. What This Book Covers
      2. Prerequisites
      3. Our Case Study: ML–Assisted Writing
      4. The ML Process
    3. Conventions Used in This Book
    4. Using Code Examples
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgments
  2. I. Find the Correct ML Approach
  3. 1. From Product Goal to ML Framing
    1. Estimate What Is Possible
      1. Models
      2. Data
    2. Framing the ML Editor
      1. Trying to Do It All with ML: An End-to-End Framework
      2. The Simplest Approach: Being the Algorithm
      3. Middle Ground: Learning from Our Experience
    3. Monica Rogati: How to Choose and Prioritize ML Projects
    4. Conclusion
  4. 2. Create a Plan
    1. Measuring Success
      1. Business Performance
      2. Model Performance
      3. Freshness and Distribution Shift
      4. Speed
    2. Estimate Scope and Challenges
      1. Leverage Domain Expertise
      2. Stand on the Shoulders of Giants
    3. ML Editor Planning
      1. Initial Plan for an Editor
      2. Always Start with a Simple Model
    4. To Make Regular Progress: Start Simple
      1. Start with a Simple Pipeline
      2. Pipeline for the ML Editor
    5. Conclusion
  5. II. Build a Working Pipeline
  6. 3. Build Your First End-to-End Pipeline
    1. The Simplest Scaffolding
    2. Prototype of an ML Editor
      1. Parse and Clean Data
      2. Tokenizing Text
      3. Generating Features
    3. Test Your Workflow
      1. User Experience
      2. Modeling Results
    4. ML Editor Prototype Evaluation
      1. Model
      2. User Experience
    5. Conclusion
  7. 4. Acquire an Initial Dataset
    1. Iterate on Datasets
      1. Do Data Science
    2. Explore Your First Dataset
      1. Be Efficient, Start Small
      2. Insights Versus Products
      3. A Data Quality Rubric
    3. Label to Find Data Trends
      1. Summary Statistics
      2. Explore and Label Efficiently
      3. Be the Algorithm
      4. Data Trends
    4. Let Data Inform Features and Models
      1. Build Features Out of Patterns
      2. ML Editor Features
    5. Robert Munro: How Do You Find, Label, and Leverage Data?
    6. Conclusion
  8. III. Iterate on Models
  9. 5. Train and Evaluate Your Model
    1. The Simplest Appropriate Model
      1. Simple Models
      2. From Patterns to Models
      3. Split Your Dataset
      4. ML Editor Data Split
      5. Judge Performance
    2. Evaluate Your Model: Look Beyond Accuracy
      1. Contrast Data and Predictions
      2. Confusion Matrix
      3. ROC Curve
      4. Calibration Curve
      5. Dimensionality Reduction for Errors
      6. The Top-k Method
      7. Other Models
    3. Evaluate Feature Importance
      1. Directly from a Classifier
      2. Black-Box Explainers
    4. Conclusion
  10. 6. Debug Your ML Problems
    1. Software Best Practices
      1. ML-Specific Best Practices
    2. Debug Wiring: Visualizing and Testing
      1. Start with One Example
      2. Test Your ML Code
    3. Debug Training: Make Your Model Learn
      1. Task Difficulty
      2. Optimization Problems
    4. Debug Generalization: Make Your Model Useful
      1. Data Leakage
      2. Overfitting
      3. Consider the Task at Hand
    5. Conclusion
  11. 7. Using Classifiers for Writing Recommendations
    1. Extracting Recommendations from Models
      1. What Can We Achieve Without a Model?
      2. Extracting Global Feature Importance
      3. Using a Model’s Score
      4. Extracting Local Feature Importance
    2. Comparing Models
      1. Version 1: The Report Card
      2. Version 2: More Powerful, More Unclear
      3. Version 3: Understandable Recommendations
    3. Generating Editing Recommendations
    4. Conclusion
  12. IV. Deploy and Monitor
  13. 8. Considerations When Deploying Models
    1. Data Concerns
      1. Data Ownership
      2. Data Bias
      3. Systemic Bias
    2. Modeling Concerns
      1. Feedback Loops
      2. Inclusive Model Performance
      3. Considering Context
      4. Adversaries
      5. Abuse Concerns and Dual-Use
    3. Chris Harland: Shipping Experiments
    4. Conclusion
  14. 9. Choose Your Deployment Option
    1. Server-Side Deployment
      1. Streaming Application or API
      2. Batch Predictions
    2. Client-Side Deployment
      1. On Device
      2. Browser Side
    3. Federated Learning: A Hybrid Approach
    4. Conclusion
  15. 10. Build Safeguards for Models
    1. Engineer Around Failures
      1. Input and Output Checks
      2. Model Failure Fallbacks
    2. Engineer for Performance
      1. Scale to Multiple Users
      2. Model and Data Life Cycle Management
      3. Data Processing and DAGs
    3. Ask for Feedback
    4. Chris Moody: Empowering Data Scientists to Deploy Models
    5. Conclusion
  16. 11. Monitor and Update Models
    1. Monitoring Saves Lives
      1. Monitoring to Inform Refresh Rate
      2. Monitor to Detect Abuse
    2. Choose What to Monitor
      1. Performance Metrics
      2. Business Metrics
    3. CI/CD for ML
      1. A/B Testing and Experimentation
      2. Other Approaches
    4. Conclusion
  17. Index

Product information

  • Title: Building Machine Learning Powered Applications
  • Author(s): Emmanuel Ameisen
  • Release date: January 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492045113