Chapter 8. Model Deployment with TensorFlow Serving

The deployment of your machine learning model is the last step before others can use your model and make predictions with it. Unfortunately, the deployment of machine learning models falls into a gray zone in today’s thinking of the division of labor in the digital world. It isn’t just a DevOps task since it requires some knowledge of the model architecture and its hardware requirements. At the same time, deploying machine learning models is a bit outside the comfort zone of machine learning engineers and data scientists. They know their models inside out but tend to struggle with the deployment of machine learning models. In this and the following chapter, we want to bridge the gap between the worlds and guide data scientists and DevOps engineers through the steps to deploy machine learning models. Figure 8-1 shows the position of the deployment step in a machine learning pipeline.

Model Deployments as part of ML Pipelines
Figure 8-1. Model deployments as part of ML pipelines

Machine learning models can be deployed in three main ways: with a model server, in a user’s browser, or on an edge device. The most common way today to deploy a machine learning model is with a model server, which we will focus on in this chapter. The client that requests a prediction submits the input data to the model server and in return receives a prediction. This requires that the client ...

Get Building Machine Learning Pipelines now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.