Chapter 8. Model Inference

Note

We would like to acknowledge Clive Cox and Alejandro Saucedo from Seldon for their great contributions to this chapter.

Most of the attention paid to machine learning has been devoted to algorithm development. However, models are not created for the sake of their creation, they are created to be put into production. Usually when people talk about taking a model “to production,” they mean performing inference. As introduced in Chapter 1 and illustrated in Figure 1-1, a complete inference solution seeks to provide serving, monitoring, and updating functionality.

Model serving

Puts a trained model behind a service that can handle prediction requests

Model monitoring

Monitors the model server for any irregularities in performance—as well as the underlying model’s accuracy

Model updating

Fully manages the versioning of your models and simplifies the promotion and rollback between versions

This chapter will explore each of these core components and define expectations for their functionality. Given concrete expectations, we will establish a list of requirements that your ideal inference solution will satisfy. Lastly, we will discuss Kubeflow-supported inference offerings and how you can use them to satisfy your inference requirements.

Model Serving

The first step of model inference is model serving, which is hosting your model behind a service that you can interface with. Two fundamental approaches to model serving are embedded, where the models ...

Get Kubeflow for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.