Chapter 5. Machine Learning Model Evaluation

This chapter is about model evaluation in the context of running production models. You will learn practical tips to deal with model evaluation as an ongoing process of improvement—strengthening your models against real data, and solving real problems that change over time.

We won’t be covering (offline) model evaluation in the traditional sense; to learn more about that topic, we recommend you check out Alice Zheng’s free eBook Evaluating Machine Learning Models: A Beginner’s Guide to Key Concepts and Pitfalls (O’Reilly).

When it comes to improving models in production, it becomes critical to compare the models against what they did yesterday, or last month—at some previous point in time. You may also want to compare your models against a known stable canary model, or against a known best model. So, let’s dig in.

Why Compare Instead of Evaluate Offline?

In a working production system, there will be many models already in production or in pre- or post-production. All of these models will be generating scores against live data. Some of the models will be the best ones known for a particular problem. Moreover, if that production system is based on a rendezvous-style architecture, it will be very easy and safe to deploy new models so that they score live data in real-time. In fact, with a rendezvous architecture it is probably easier to deploy a model into a production setting than it is to gather training data and do offline evaluation. ...

Get Machine Learning Logistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.