Chapter 10. Build Safeguards for Models

When designing databases or distributed systems, software engineers concern themselves with fault tolerance, the ability for a system to continue working when some of its components fail. In software, the question is not whether a given part of the system will fail, but when. The same principles can be applied to ML. No matter how good a model is, it will fail on some examples, so you should engineer a system that can gracefully handle such failures.

In this chapter, we will cover different ways to help prevent or mitigate failures. First, we’ll see how to verify the quality of the data that we receive and produce and use this verification to decide how to display results to users. Then, we will take a look at ways to make a modeling pipeline more robust to be able to serve many users efficiently. After that, we’ll take a look at options to leverage user feedback and judge how a model is performing. We’ll end the chapter with an interview with Chris Moody about deployment best practices.

Engineer Around Failures

Let’s cover some of the most likely ways for an ML pipeline to fail. The observant reader will notice that these failure cases are somewhat similar to the debugging tips we saw in “Debug Wiring: Visualizing and Testing”. Indeed, exposing a model to users in production comes with a set of challenges that mirrors the ones that come with debugging a model.

Bugs and errors can show up anywhere, but three areas in particular are most ...

Get Building Machine Learning Powered Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.