9

Implementing Model Servers

In Chapter 8, Considering Hardware for Inference, we discussed hardware options and optimizations for serving DL models that are available to you as part of the Amazon SageMaker platform. In this chapter, we will focus on another important aspect of engineering inference workloads – choosing and configuring model servers.

Model servers, similar to application servers for regular applications, provide a runtime context to serve your DL models. You, as a developer, deploy trained models to the model server, which exposes the deployed models as REST or gRPC endpoints. The end users of your DL models then send inference requests to established endpoints and receive a response with predictions. The model server can serve ...

Get Accelerate Deep Learning Workloads with Amazon SageMaker now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.