8

Considering Hardware for Inference

In Part 3, Serving Deep Learning Models of this book, we will focus on how to develop, optimize, and operationalize inference workloads for deep learning (DL) models. Just like training, DL inference is computationally intensive and requires an understanding of specific types of hardware built for inference, model optimization techniques, and specialized software servers to manage model deployment and handle inference traffic. Amazon SageMaker provides a wide range of capabilities to address these aspects.

In this chapter, we will discuss hardware options and model optimization for model serving. We will review the available hardware accelerators that are suitable for DL inference and discuss how to select ...

Get Accelerate Deep Learning Workloads with Amazon SageMaker now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.