Skip to Content
Modelli di progettazione dell'IA generativa
book

Modelli di progettazione dell'IA generativa

by Valliappa Lakshmanan, Hannes Hapke
October 2025
Intermediate to advanced
508 pages
12h 52m
Italian
O'Reilly Media, Inc.
Book available
Content preview from Modelli di progettazione dell'IA generativa

Chapter 8. Addressing Constraints

Deploying LLMs in production environments presents a unique set of challenges that go far beyond simply getting a model to work. While LLMs offer remarkable capabilities, they also demand substantial computational resources, introduce latency concerns, and can quickly become cost prohibitive at scale. The gap between a proof-of-concept that works on a single query and a production system serving thousands of users is often overlooked.

In this chapter, we provide patterns that address concerns you’re likely to face when deploying LLMs in production systems. Whether you’re facing hardware limitations, budget constraints, or strict latency requirements, the patterns presented here offer proven strategies for optimizing your LLM deployment.

We’ll explore five key patterns that tackle different aspects of production constraints. The section on the Small Language Model (Pattern 24) shows you how to reduce computational overhead through model distillation and quantization techniques. The section on Prompt Caching (Pattern 25) demonstrates how to eliminate redundant processing and reduce both costs and latency. The section on Optimizing Inference (Pattern 26) covers advanced techniques like continuous batching and speculative decoding to maximize hardware utilization. The section on Degradation Testing (Pattern 27) provides the metrics you need to validate that your LLM-based application is performing well, and it also covers actions that you can take ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Modelli linguistici di grandi dimensioni pratici

Modelli linguistici di grandi dimensioni pratici

Jay Alammar, Maarten Grootendorst
IA generativa pratica con trasformatori e modelli di diffusione

IA generativa pratica con trasformatori e modelli di diffusione

Omar Sanseviero, Pedro Cuenca, Apolinário Passos, Jonathan Whitaker
What Employees Want Most in Uncertain Times

What Employees Want Most in Uncertain Times

Kristine W. Powers, Jessica B.B. Diaz

Publisher Resources

ISBN: 9798341671416Supplemental Content