Chapter 9. Choose Your Deployment Option

The previous chapters covered the process of going from a product idea to an ML implementation, as well as methods to iterate on this application until you are ready to deploy it.

This chapter covers different deployment options and the trade-offs between each of them. Different deployment approaches are suited to different sets of requirements. When considering which one to choose, you’ll want to think of multiple factors such as latency, hardware and network requirements, as well as privacy, cost, and complexity concerns.

The goal of deploying a model is to allow users to interact with it. We will cover common approaches to achieve this goal, as well as tips to decide between approaches when deploying models.

We will start with the simplest way to get started when deploying models and spinning up a web server to serve predictions.

Server-Side Deployment

Server-side deployment consists of setting up a web server that can accept requests from clients, run them through an inference pipeline, and return the results. This solution fits within a web development paradigm, as it treats models as another endpoint in an application. Users have requests that they send to this endpoint, and they expect results.

There are two common workloads for server-side models, streaming and batch. Streaming workflows accept requests as they come and process them immediately. Batch workflows are run less frequently and process a large number of requests all ...

Get Building Machine Learning Powered Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.