Chapter 5. Design Patterns for Resilient Serving

The purpose of a machine learning model is to use it to make inferences on data it hasn’t seen during training. Therefore, once a model has been trained, it is typically deployed into a production environment and used to make predictions in response to incoming requests. Software that is deployed into production environments is expected to be resilient and require little in the way of human intervention to keep it running. The design patterns in this chapter solve problems associated with resilience under different circumstances as it relates to production ML models.

The Stateless Serving Function design pattern allows the serving infrastructure to scale and handle thousands or even millions of prediction requests per second. The Batch Serving design pattern allows the serving infrastructure to asynchronously handle occasional or periodic requests for millions to billions of predictions. These patterns are useful beyond resilience in that they reduce coupling between creators and users of machine learning models.

The Continued Model Evaluation design pattern handles the common problem of detecting when a deployed model is no longer fit-for-purpose. The Two-Phase Predictions design pattern provides a way to address the problem of keeping models sophisticated and performant when they have to be deployed onto distributed devices. The Keyed Predictions design pattern is a necessity to scalably implement several of the design patterns ...

Get Machine Learning Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.