Chapter 8. Serving
You’ve made a model; now you have to get it out there into the world and start predicting things. This is a process often known as serving the model. That’s a common shorthand for “creating a structure to ensure our system can ask the model to make predictions on new examples, and return those predictions to the people or systems that need them” (so you can see why the shorthand was invented).
In our yarnit.ai online store example, we can imagine that our team has just created a model that is great at predicting the likelihood that a given user will purchase a given product. We need to have a way for the model to share its predictions with our overall system. But how, exactly, should we set this up?
We have a range of possibilities, each with different architectures and trade-offs. They are sufficiently different in approach that it might not be obvious looking at the list that these are all attempts to solve the same problem: how can we integrate our predictions with the overall system? We could do any of the following:
Load the model into 1,000 servers in Des Moines, Iowa, and feed all incoming traffic to these servers.
Precompute the model’s predictions for the 100,000,000 most commonly seen combinations of yarn products and user queries using a big offline batch-processing job. Write those to a shared database once a day that is read by our system, and use a default score of p = 0.01 for anything not in that list.
Create a JavaScript version of the model ...
Get Reliable Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.