book

LLMOps

Name: LLMOps
Author: Abi Aryan
ISBN: 9781098154202

by Abi Aryan

July 2025

Intermediate to advanced

284 pages

8h 21m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Preface
Conventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introduction to Large Language Models
Some Key TermsTransformer ModelsLarge Language ModelsLLM ArchitecturesEncoder-Only LLMsDecoder-Only LLMsEncoder–Decoder LLMsState Space ArchitecturesSmall Language ModelsChoosing an LLMConsiderations in the Selection of an LLMThe Big Debate: Open Source Versus Proprietary LLMsEnterprise Use Cases for LLMsKnowledge RetrievalTranslationSpeech SynthesisRecommender SystemsAutonomous AI AgentsAgentic SystemsTen Challenges of Building with LLMs1. Size and Complexity2. Training Scale and Duration3. Prompt Engineering4. Inference Latency and Throughput5. Ethical Considerations6. Resource Scaling and Orchestration7. Integrations and Toolkits8. Broad Applicability9. Privacy and Security10. CostsConclusionReferences
2. Introduction to LLMOps
What Are Operational Frameworks?From MLOps to LLMOps: Why Do We Need a New Framework?Four Goals for LLMOpsLLMOps Teams and RolesThe LLMOps Engineer RoleA Day in the LifeHiring an LLMOps Engineer ExternallyHiring Internally: Upskilling an MLOps Engineer into an LLMOps EngineerLLMs and Your OrganizationThe Four Goals of LLMOpsReliabilityScalabilityRobustnessSecurityThe LLMOps Maturity ModelConclusionReferencesFurther Reading
3. LLM-Based Applications
Using AI Models in ApplicationsInfrastructure ApplicationsAgentic WorkflowsModel Context ProtocolAgent-to-Agent ProtocolThe Rise of vLLMs and Multimodal LLMsThe LLMOps QuestionMonitoring Application PerformanceMeasuring a Consumer LLM Application’s PerformanceChoosing the Best Model for Your ApplicationOther Application MetricsWhat Can You Control in an LLM-Based Application?Prompt Engineering Is “Hard”Did Our Prompt Engineering Produce Better Results?LLM-Based Infrastructure Systems Are “Harder”ConclusionReferences
4. Data Engineering for LLMs
Data Engineering and the Rise of LLMsThe DataOps Engineer RoleData ManagementSynthetic DataLLM PipelinesTraining an LLMData CompositionScaling LawsData RepetitionData QualityA General Data-Preprocessing Pipeline for LLMsStep 1: Catalog Your DataStep 2: Check Privacy and Legal ComplianceStep 3: Filter the DataStep 4: Perform Data DeduplicationStep 5: Collect DataStep 6: Detect EncodingStep 7: Detect LanguagesStep 8: ChunkingStep 9: Back Up Your DataStep 10: Perform Maintenance and UpdatesVectorizationVector DatabasesMaintaining Fresh DataGenerating the Fine-Tuning DatasetAutomatically Generating an Instruction Fine-Tuning DatasetConclusionReferencesFurther Reading
5. Model Domain Adaptation for LLM-Based Applications
Training LLMs from ScratchStep 1: Pick a TaskStep 2: Prepare the DataStep 3: Decide on the Model ArchitectureStep 4: Set Up Your Training InfrastructureStep 5: Implement TrainingModel Ensembling ApproachesModel Averaging and BlendingWeighted EnsemblingStacked Ensembling (Two-Stage Model)Diverse Ensembles for RobustnessMulti-Step Decoding and Voting MechanismsComposabilitySoft Actor–CriticModel Domain AdaptationPrompt EngineeringOne-Shot PromptingFew-Shot PromptingChain-of-Thought PromptingRetrieval-Augmented GenerationSemantic KernelFine-TuningAdaptive Fine-TuningAdapters (Single, Parallel, and Scaled Parallel)Behavioral Fine-TuningPrefix TuningParameter-Efficient Fine-TuningInstruction Tuning and Reinforcement Learning from Human FeedbackChoosing Between Fine-Tuning and Prompt EngineeringMixture of ExpertsModel Optimization for Resource-Constrained DevicesLessons for Effective LLM DevelopmentScaling LawChinchilla ModelsLearning-Rate OptimizationSpeculative SamplingConclusionReferences
6. API-First LLM Deployment
Deploying Your ModelStep 1: Set Up Your EnvironmentStep 2: Containerize the LLMStep 3: Automate Pipelines with JenkinsStep 4: Workflow OrchestrationStep 5: Set Up MonitoringDeveloping APIs for LLMsAPI-Led Architecture StrategiesREST APIsAPI ImplementationStep 1: Define Your API’s EndpointsStep 2: Choose an API Development FrameworkStep 3: Test the APICredential ManagementAPI GatewaysAPI Versioning and Lifecycle ManagementLLM Deployment ArchitecturesModular and Monolithic ArchitecturesImplementing a Microservices-Based ArchitectureAutomating RAG with Retriever Re-ranker PipelinesAutomating Knowledge Graph UpdatesDeployment Latency OptimizationOrchestrating Multiple ModelsOptimizing RAG PipelinesAsynchronous QueryingCombining Dense and Sparse Retrieval MethodsCache EmbeddingsKey–Value CachingScalability and ReusabilityConclusion
7. Evaluation for LLMs
Why Evaluation Is a Hard ProblemEvaluating PerformanceEvaluating What Breaks Before It Breaks EverythingMetrics for RAG ApplicationsMetrics for Agentic SystemsGeneral Evaluation ConsiderationsThe Value of Automated MetricsModel DriftTraditional Metrics Aren’t EnoughThe Observability PipelinePreprocessing and Prompt ConstructionRetrieval in RAG PipelinesLLM InferencePostprocessing and Output ValidationCapturing FeedbackConclusionReferences
8. Governance: Monitoring, Privacy, and Security
The Data Issue: Scale and SensitivitySecurity RisksPrompt InjectionJailbreakingOther Security RisksDefensive Measures: LLMSecOpsConducting an LLMSecOps AuditStep 1: Define Scope and ObjectivesStep 2: Gather InformationStep 3: Perform Risk Analysis and Threat ModelingStep 4: Evaluate Security Controls and ComplianceStep 5: Perform Penetration Testing and/or Red TeamingStep 6: Review the Training DataStep 7: Assess Model Performance and BiasStep 8: Document the Audit’s Findings and RecommendationsStep 9: Plan Ongoing Monitoring and ReviewStep 10: Create a Communication and Remediation PlanSafety and Ethical GuardrailsConclusionReferences
9. Scaling: Hardware, Infrastructure, and Resource Management
Choosing the Right ApproachScaling and Resource AllocationMonitoringA/B Testing and Shadow Testing for LLMsAutomatic Infrastructure Provisioning and ManagementProvisioning and Management in Cloud ArchitecturesProvisioning and Management on Owned HardwareBest Practices for Automatic Infrastructure ManagementScaling Law and the Compute-Optimal ArgumentOptimizing LLM InfrastructureKernel FusionPrecision ScalingHardware UtilizationParallel and Distributed Computing for LLMsData ParallelismModel ParallelismPipeline ParallelismAdvanced Frameworks: ZeRO and DeepSpeedBackup and Failsafe Processes for LLM ApplicationsTypes of Backup StrategiesThe Most Important Practice: Test Restores RegularlyConclusionReferences

10. The Future of LLMs and LLMOps
Scaling Beyond Current BoundariesHybrid Architectures: Merging Neural Networks with Symbolic AISparse and Mixture-of-Experts ModelsMemory-Augmented Models: Toward Persistent, Context-Rich AIInterpretable and Self-Optimizing ModelsCross-Model Collaboration, Meta-Learning, and Multi-Modal Fine-TuningRAGThe Future of LLMOpsAdvances in GPU TechnologyData Management and EfficiencyPrivacy and SecurityComprehensive Evaluation FrameworksHow to Succeed as an LLMOps EngineerConclusionReferencesFurther Reading
Index
About the Author

Content preview from LLMOps

Chapter 9. Scaling: Hardware, Infrastructure, and Resource Management

Deploying and managing LLMs presents unique challenges and opportunities in the realm of infrastructure and resource management. LLMs, as you’ve seen throughout this book, are computationally intensive, requiring substantial hardware, storage, and network resources to operate efficiently. Whether you’re leveraging LLMs as a cloud-based service, deploying pretrained models in on-premises data centers, or training your own models from scratch, your infrastructure decisions will influence their performance, scalability, and cost-effectiveness.

Effective resource management for LLMs involves optimizing compute power, memory, and storage. In this chapter, we will explore the key components of infrastructure for LLMs, including hardware requirements and deployment strategies. We’ll also discuss best practices for optimizing resource use, managing costs, and maintaining reliability in production environments. This chapter will help you understand the trade-offs involved in managing resources for large-scale AI applications.

Choosing the Right Approach

Selecting the appropriate method for using LLMs depends on the requirements of the application that you want to use it for. For startups or small-scale applications, using models directly from the cloud may be the quickest and most cost-effective solution. For enterprises with specialized requirements or high workloads, deploying LLMs on cloud infrastructure can help ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098154196Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

LLMOps

by Abi Aryan

Chapter 9. Scaling: Hardware, Infrastructure, and Resource Management

Choosing the Right Approach

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.