It’s critical to have “humans in the loop” when automating the deployment of machine learning (ML) models. Why? Because models often perform worse over time. This course covers the human directed safeguards that prevent poorly performing models from deploying into production and the techniques for evaluating models over time. We’ll use ModelDB to capture the appropriate metrics that help you identify poorly performing models. We'll review the many factors that affect model performance (i.e., changing users and user preferences, stale data, etc.) and the variables that lose predictive power. We'll explain how to utilize classification and prediction scoring methods such as precision recall, ROC, and jaccard similarity. We'll also show you how ModelDB allows you to track provenance and metrics for model performance and health; how to integrate ModelDB with SparkML; and how to use the ModelDB APIs to store information when training models in Spark ML. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; cloud platforms like Amazon Web Services; Bash, Docker, and REST.
- Learn how to use ModelDB and Spark to track and improve model performance over time
- Understand how to identify poorly performing models and prevent them from deploying into production
- Explore classification and prediction scoring methods for training and evaluating ML models
Manasi Vartak is a PhD student in the Database Group at MIT, where she works on systems for analysis of large scale data.
Jason Slepicka is a senior data engineer with DataScience, where he builds pipelines and data science platform infrastructure. Jason is working on his PhD in Computer Science at the University of Southern California Information Sciences Institute.
Table of contents
Monitoring and Improving the Performance of Machine Learning Models
- Why Model Maintenance? 00:06:10
- Principles and Causes of Model Decay 00:05:00
- Key Requirements for Model Maintenance 00:06:19
- Deploying a Model Maintenance System with Spark ML 00:11:32
- Reviewing and Monitoring Model Performance 00:06:51
- Title: Monitoring and Improving the Performance of Machine Learning Models
- Release date: December 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491988848
You might also like
Machine Learning Design Patterns
The design patterns in this book capture best practices and solutions to recurring problems in machine …
Fundamentals of Software Architecture
Salary surveys worldwide regularly place software architect in the top 10 best jobs, yet no real …
Practical Time Series Analysis
Time series data analysis is increasingly important due to the massive production of such data through …
More than half of the analytics and machine learning (ML) models created by organizations today never …