Skip to Content
LLM Engineer's Handbook
book

LLM Engineer's Handbook

by Paul Iusztin, Maxime Labonne
October 2024
Intermediate to advanced
522 pages
12h 55m
English
Packt Publishing
Content preview from LLM Engineer's Handbook

10

Inference Pipeline Deployment

Deploying the inference pipeline for the large language model (LLM) Twin application is a critical stage in the machine learning (ML) application life cycle. It’s where the most value is added to your business, making your models accessible to your end users. However, successfully deploying AI models can be challenging, as the models require expensive computing power and access to up-to-date features to run the inference. To overcome these constraints, it’s crucial to carefully design your deployment strategy. This ensures that it meets the application’s requirements, such as latency, throughput, and costs. As we work with LLMs, we must consider the inference optimization techniques presented in Chapter 8, such ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

AI Engineering

AI Engineering

Chip Huyen
AI Engineering

AI Engineering

Chip Huyen
AI Engineering

AI Engineering

Chip Huyen

Publisher Resources

ISBN: 9781836200079Supplemental Content