October 2023
Intermediate to advanced
288 pages
8h 46m
English
As the power we unlock from large language models grows, so, too, does the necessity of deploying these models to production so we can share our hard work with more people. This chapter explores different strategies for considering deployments of both closed-source and open-source LLMs, with an emphasis on best practices for model management, preparation for inference, and methods for improving efficiency such as quantization, pruning, and distillation.
For closed-source LLMs, the deployment process typically involves interacting with an API provided by the company that developed the model. This model-as-a-service approach is convenient because the underlying ...
Read now
Unlock full access