book

Designing Large Language Model Applications

by Suhas Pai

March 2025

Intermediate to advanced

366 pages

9h 31m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Includes Quizzes

Who This Book Is ForHow This Book Is StructuredWhat This Book Is Not AboutHow to Read the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
Defining LLMsA Brief History of LLMsEarly YearsThe Modern LLM EraThe Impact of LLMsLLM Usage in the EnterprisePromptingZero-Shot PromptingFew-Shot PromptingChain-of-Thought PromptingPrompt ChainingAdversarial PromptingAccessing LLMs Through an APIStrengths and Limitations of LLMsBuilding Your First Chatbot PrototypeFrom Prototype to ProductionSummary
Ingredients of an LLMPre-Training Data RequirementsPopular Pre-Training DatasetsSynthetic Pre-Training DataTraining Data PreprocessingData Filtering and CleaningSelecting Quality DocumentsDeduplicationRemoving Personally Identifiable InformationTraining Set DecontaminationData MixturesEffect of Pre-Training Data on Downstream TasksBias and Fairness Issues in Pre-Training DatasetsSummary
VocabularyTokenizersTokenization PipelineNormalizationPre-TokenizationTokenizationByte Pair EncodingWordPieceSpecial TokensSummary
PreliminariesRepresenting MeaningThe Transformer ArchitectureSelf-AttentionPositional EncodingFeedforward NetworksLayer NormalizationLoss FunctionsIntrinsic Model EvaluationTransformer BackbonesEncoder-Only ArchitecturesEncoder-Decoder ArchitecturesDecoder-Only ArchitecturesMixture of ExpertsLearning ObjectivesFull Language ModelingPrefix Language ModelingMasked Language ModelingWhich Learning Objectives Are Better?Pre-Training ModelsSummary
Navigating the LLM LandscapeWho Are the LLM providers?Model FlavorsOpen Source LLMsHow to Choose an LLM for Your TaskOpen Source Versus Proprietary LLMsLLM EvaluationLoading LLMsHugging Face AccelerateOllamaLLM Inference APIsDecoding StrategiesGreedy DecodingBeam SearchTop-k SamplingTop-p SamplingRunning Inference on LLMsStructured OutputsModel Debugging and InterpretabilitySummary
The Need for Fine-TuningFine-Tuning: A Full ExampleLearning Algorithms ParametersMemory Optimization ParametersRegularization ParametersBatch SizeParameter-Efficient Fine-TuningWorking with Reduced PrecisionPutting It All TogetherFine-Tuning DatasetsUtilizing Publicly Available Instruction-Tuning DatasetsLLM-Generated Instruction-Tuning DatasetsSummary
Continual Pre-TrainingReplay (Memory)Parameter ExpansionParameter-Efficient Fine-TuningAdding New ParametersSubset MethodsCombining Multiple ModelsModel EnsemblingModel FusionAdapter MergingSummary

Defining Alignment TrainingReinforcement LearningTypes of Human FeedbackRLHF ExampleHallucinationsMitigating HallucinationsSelf-ConsistencyChain-of-ActionsRecitationSampling Methods for Addressing HallucinationDecoding by Contrasting LayersIn-Context HallucinationsHallucinations Due to Irrelevant InformationReasoningDeductive ReasoningInductive ReasoningAbductive ReasoningCommon Sense ReasoningInducing Reasoning in LLMsVerifiers for Improving ReasoningInference-Time ComputationFine-Tuning for ReasoningSummary
LLM Inference ChallengesInference Optimization TechniquesTechniques for Reducing ComputeK-V CachingEarly ExitKnowledge DistillationTechniques for Accelerating DecodingSpeculative DecodingParallel DecodingTechniques for Reducing Storage NeedsSymmetric QuantizationAsymmetric QuantizationSummary
LLM Interaction ParadigmsPassive ApproachThe Explicit ApproachThe Autonomous ApproachDefining AgentsAgentic WorkflowComponents of an Agentic SystemModelsToolsData StoresAgent Loop PromptGuardrails and VerifiersAgent Orchestration SoftwareSummary
Introduction to EmbeddingsSemantic SearchSimilarity MeasuresFine-Tuning Embedding ModelsBase ModelsTraining DatasetLoss FunctionsInstruction EmbeddingsOptimizing Embedding SizeMatryoshka EmbeddingsBinary and Integer EmbeddingsProduct QuantizationChunkingSliding Window ChunkingMetadata-Aware ChunkingLayout-Aware ChunkingSemantic ChunkingLate ChunkingVector DatabasesInterpreting EmbeddingsSummary
The Need for RAGTypical RAG ScenariosDeciding When to RetrieveThe RAG PipelineRewriteRetrieveRerankRefineInsertGenerateRAG for Memory ManagementRAG for Selecting In-Context Training ExamplesRAG for Model TrainingLimitations of RAGRAG Versus Long ContextRAG Versus Fine-TuningSummary
Multi-LLM ArchitecturesLLM CascadesRoutersTask-Specialized LLMsProgramming ParadigmsDSPyLMQLSummary

Content preview from Designing Large Language Model Applications

Chapter 6. Fine-Tuning

In the previous chapter, we discussed the various factors that need to be taken into account while choosing the right LLM for your specific needs, including pointers on how to evaluate LLMs to be able to make an informed choice. Next, let us utilize these LLMs to solve our tasks.

In this chapter, we will explore the process of adapting an LLM to solve your task of interest, using fine-tuning. We will go through a full example of fine-tuning, covering all the important decisions one needs to make. We will also discuss the art and science of creating fine-tuning datasets.

The Need for Fine-Tuning

Why do we need to fine-tune LLMs? Why doesn’t a pre-trained LLM with few-shot prompts suffice for our needs? Let us look at a couple of examples to drive the point home:

Use Case 1: Consider you are working on the rather whimsical task of detecting all sentences written in the past tense within a body of text and transforming them to future tense. To solve this task, you might provide a few examples of past tense sentences and input-output pairs representing past tense and their corresponding future tense sentences. However, the LLM doesn’t seem to be able to tackle this task to your satisfaction, making mistakes in both the identification and transformation steps. In response, you elaborate on your instructions, adding grammar rules and exceptions in the English language into your prompt. You notice an increase in performance. But with each new rule added, your ...