9

Moving LLMs into Production

Introduction

As the power we unlock from large language models grows, so, too, does the necessity of deploying these models to production so we can share our hard work with more people. This chapter explores different strategies for considering deployments of both closed-source and open-source LLMs, with an emphasis on best practices for model management, preparation for inference, and methods for improving efficiency such as quantization, pruning, and distillation.

Deploying Closed-Source LLMs to Production

For closed-source LLMs, the deployment process typically involves interacting with an API provided by the company that developed the model. This model-as-a-service approach is convenient because the underlying ...

Get Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.