book

LLMOps

Name: LLMOps
Author: Abi Aryan
ISBN: 9781098154202

by Abi Aryan

July 2025

Intermediate to advanced

284 pages

8h 21m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Preface
Conventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introduction to Large Language Models
Some Key TermsTransformer ModelsLarge Language ModelsLLM ArchitecturesEncoder-Only LLMsDecoder-Only LLMsEncoder–Decoder LLMsState Space ArchitecturesSmall Language ModelsChoosing an LLMConsiderations in the Selection of an LLMThe Big Debate: Open Source Versus Proprietary LLMsEnterprise Use Cases for LLMsKnowledge RetrievalTranslationSpeech SynthesisRecommender SystemsAutonomous AI AgentsAgentic SystemsTen Challenges of Building with LLMs1. Size and Complexity2. Training Scale and Duration3. Prompt Engineering4. Inference Latency and Throughput5. Ethical Considerations6. Resource Scaling and Orchestration7. Integrations and Toolkits8. Broad Applicability9. Privacy and Security10. CostsConclusionReferences
2. Introduction to LLMOps
What Are Operational Frameworks?From MLOps to LLMOps: Why Do We Need a New Framework?Four Goals for LLMOpsLLMOps Teams and RolesThe LLMOps Engineer RoleA Day in the LifeHiring an LLMOps Engineer ExternallyHiring Internally: Upskilling an MLOps Engineer into an LLMOps EngineerLLMs and Your OrganizationThe Four Goals of LLMOpsReliabilityScalabilityRobustnessSecurityThe LLMOps Maturity ModelConclusionReferencesFurther Reading
3. LLM-Based Applications
Using AI Models in ApplicationsInfrastructure ApplicationsAgentic WorkflowsModel Context ProtocolAgent-to-Agent ProtocolThe Rise of vLLMs and Multimodal LLMsThe LLMOps QuestionMonitoring Application PerformanceMeasuring a Consumer LLM Application’s PerformanceChoosing the Best Model for Your ApplicationOther Application MetricsWhat Can You Control in an LLM-Based Application?Prompt Engineering Is “Hard”Did Our Prompt Engineering Produce Better Results?LLM-Based Infrastructure Systems Are “Harder”ConclusionReferences
4. Data Engineering for LLMs
Data Engineering and the Rise of LLMsThe DataOps Engineer RoleData ManagementSynthetic DataLLM PipelinesTraining an LLMData CompositionScaling LawsData RepetitionData QualityA General Data-Preprocessing Pipeline for LLMsStep 1: Catalog Your DataStep 2: Check Privacy and Legal ComplianceStep 3: Filter the DataStep 4: Perform Data DeduplicationStep 5: Collect DataStep 6: Detect EncodingStep 7: Detect LanguagesStep 8: ChunkingStep 9: Back Up Your DataStep 10: Perform Maintenance and UpdatesVectorizationVector DatabasesMaintaining Fresh DataGenerating the Fine-Tuning DatasetAutomatically Generating an Instruction Fine-Tuning DatasetConclusionReferencesFurther Reading
5. Model Domain Adaptation for LLM-Based Applications
Training LLMs from ScratchStep 1: Pick a TaskStep 2: Prepare the DataStep 3: Decide on the Model ArchitectureStep 4: Set Up Your Training InfrastructureStep 5: Implement TrainingModel Ensembling ApproachesModel Averaging and BlendingWeighted EnsemblingStacked Ensembling (Two-Stage Model)Diverse Ensembles for RobustnessMulti-Step Decoding and Voting MechanismsComposabilitySoft Actor–CriticModel Domain AdaptationPrompt EngineeringOne-Shot PromptingFew-Shot PromptingChain-of-Thought PromptingRetrieval-Augmented GenerationSemantic KernelFine-TuningAdaptive Fine-TuningAdapters (Single, Parallel, and Scaled Parallel)Behavioral Fine-TuningPrefix TuningParameter-Efficient Fine-TuningInstruction Tuning and Reinforcement Learning from Human FeedbackChoosing Between Fine-Tuning and Prompt EngineeringMixture of ExpertsModel Optimization for Resource-Constrained DevicesLessons for Effective LLM DevelopmentScaling LawChinchilla ModelsLearning-Rate OptimizationSpeculative SamplingConclusionReferences
6. API-First LLM Deployment
Deploying Your ModelStep 1: Set Up Your EnvironmentStep 2: Containerize the LLMStep 3: Automate Pipelines with JenkinsStep 4: Workflow OrchestrationStep 5: Set Up MonitoringDeveloping APIs for LLMsAPI-Led Architecture StrategiesREST APIsAPI ImplementationStep 1: Define Your API’s EndpointsStep 2: Choose an API Development FrameworkStep 3: Test the APICredential ManagementAPI GatewaysAPI Versioning and Lifecycle ManagementLLM Deployment ArchitecturesModular and Monolithic ArchitecturesImplementing a Microservices-Based ArchitectureAutomating RAG with Retriever Re-ranker PipelinesAutomating Knowledge Graph UpdatesDeployment Latency OptimizationOrchestrating Multiple ModelsOptimizing RAG PipelinesAsynchronous QueryingCombining Dense and Sparse Retrieval MethodsCache EmbeddingsKey–Value CachingScalability and ReusabilityConclusion
7. Evaluation for LLMs
Why Evaluation Is a Hard ProblemEvaluating PerformanceEvaluating What Breaks Before It Breaks EverythingMetrics for RAG ApplicationsMetrics for Agentic SystemsGeneral Evaluation ConsiderationsThe Value of Automated MetricsModel DriftTraditional Metrics Aren’t EnoughThe Observability PipelinePreprocessing and Prompt ConstructionRetrieval in RAG PipelinesLLM InferencePostprocessing and Output ValidationCapturing FeedbackConclusionReferences
8. Governance: Monitoring, Privacy, and Security
The Data Issue: Scale and SensitivitySecurity RisksPrompt InjectionJailbreakingOther Security RisksDefensive Measures: LLMSecOpsConducting an LLMSecOps AuditStep 1: Define Scope and ObjectivesStep 2: Gather InformationStep 3: Perform Risk Analysis and Threat ModelingStep 4: Evaluate Security Controls and ComplianceStep 5: Perform Penetration Testing and/or Red TeamingStep 6: Review the Training DataStep 7: Assess Model Performance and BiasStep 8: Document the Audit’s Findings and RecommendationsStep 9: Plan Ongoing Monitoring and ReviewStep 10: Create a Communication and Remediation PlanSafety and Ethical GuardrailsConclusionReferences
9. Scaling: Hardware, Infrastructure, and Resource Management
Choosing the Right ApproachScaling and Resource AllocationMonitoringA/B Testing and Shadow Testing for LLMsAutomatic Infrastructure Provisioning and ManagementProvisioning and Management in Cloud ArchitecturesProvisioning and Management on Owned HardwareBest Practices for Automatic Infrastructure ManagementScaling Law and the Compute-Optimal ArgumentOptimizing LLM InfrastructureKernel FusionPrecision ScalingHardware UtilizationParallel and Distributed Computing for LLMsData ParallelismModel ParallelismPipeline ParallelismAdvanced Frameworks: ZeRO and DeepSpeedBackup and Failsafe Processes for LLM ApplicationsTypes of Backup StrategiesThe Most Important Practice: Test Restores RegularlyConclusionReferences

10. The Future of LLMs and LLMOps
Scaling Beyond Current BoundariesHybrid Architectures: Merging Neural Networks with Symbolic AISparse and Mixture-of-Experts ModelsMemory-Augmented Models: Toward Persistent, Context-Rich AIInterpretable and Self-Optimizing ModelsCross-Model Collaboration, Meta-Learning, and Multi-Modal Fine-TuningRAGThe Future of LLMOpsAdvances in GPU TechnologyData Management and EfficiencyPrivacy and SecurityComprehensive Evaluation FrameworksHow to Succeed as an LLMOps EngineerConclusionReferencesFurther Reading
Index
About the Author

Content preview from LLMOps

Chapter 2. Introduction to LLMOps

The size and complexity of LLMs’ architecture can make productionizing these models incredibly hard. Productionizing means not just deploying a model but also monitoring it, evaluating it, and optimizing its performance.

There are constantly new challenges. Depending on your application, these may include how to process data, how to store and dynamically adapt prompts, how to monitor user interaction, and—most pressing—how to prevent the model from spreading misinformation or memorizing training data (which can lead it to release personal information). That’s why operationalizing LLMs, which means managing them day-to-day in production, requires a new framework.

LLMOps, as it’s called, is an operational framework for putting LLM applications in production. Although its name and principles are inspired by its older siblings, MLOps and DevOps, LLMOps is significantly more nuanced. The LLMOps framework can help companies reduce technical debt, maintain compliance, deal with LLMs’ dynamic and experimental nature, and minimize operational and reputational risk by avoiding common pitfalls.

This chapter starts by discussing what LLMOps is and how and where it departs from MLOps. We’ll then introduce you to the LLMOps engineer role and where it fits into existing ML teams. From there, we’ll look at how to measure LLMOps readiness within teams, assess your organization’s LLMOps maturity, and identify crucial KPIs for measuring success. Toward the end of ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098154196Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

LLMOps

by Abi Aryan

Chapter 2. Introduction to LLMOps

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.