10

Operationalizing Inference Workloads

In Chapter 8, Considering Hardware for Inference, and Chapter 9, Implementing Model Servers, we discussed how to engineer your deep learning (DL) inference workloads on Amazon SageMaker. We also reviewed how to select appropriate hardware for inference workloads, optimize model performance, and tune model servers based on specific use case requirements. In this chapter, we will focus on how to operationalize your DL inference workloads once they have been deployed to test and production environments.

In this chapter, we will start by reviewing advanced model hosting options such as multi-model, multi-container, and Serverless Inference endpoints to optimize your resource utilization and workload costs. ...

Get Accelerate Deep Learning Workloads with Amazon SageMaker now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.