Chapter 13. Model Serving Infrastructure

Just like any other application, your ML infrastructure can be trained and deployed on premises on your own hardware infrastructure. However, this approach necessitates procurement of the hardware (physical machines) and the GPUs for training and inference of large models (deep neural networks, or DNNs). This can be viable for large companies that run and maintain ML applications for a long time.

The viable option for small to medium-size businesses and individual teams is to deploy on a cloud and leverage the hardware infrastructure provided by cloud service providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. Most of the popular cloud service providers have specialized training and deployment solutions for ML models. These include AutoML on GCP and Amazon SageMaker Autopilot on AWS.

When you’re deploying ML models on premises (on your own hardware infrastructure), you can use an open source prebuilt model server such as TensorFlow Serving, KServe, or NVIDIA Triton.

If you choose to deploy ML models on a cloud, you can deploy trained models on virtual machines (VMs) such as EC2 or Google Compute Engine, and use model servers such as TensorFlow Serving to serve inference requests. Or you may choose to use compute cluster offerings such as Google Kubernetes Engine.

Cloud service providers also offer solutions for managing the entire ML workflow, including data cleaning, data preparation, feature ...

Get Machine Learning Production Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.