book

Designing Large Language Model Applications

by Suhas Pai

March 2025

Intermediate to advanced

366 pages

9h 31m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Includes Quizzes

Who This Book Is ForHow This Book Is StructuredWhat This Book Is Not AboutHow to Read the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
Defining LLMsA Brief History of LLMsEarly YearsThe Modern LLM EraThe Impact of LLMsLLM Usage in the EnterprisePromptingZero-Shot PromptingFew-Shot PromptingChain-of-Thought PromptingPrompt ChainingAdversarial PromptingAccessing LLMs Through an APIStrengths and Limitations of LLMsBuilding Your First Chatbot PrototypeFrom Prototype to ProductionSummary
Ingredients of an LLMPre-Training Data RequirementsPopular Pre-Training DatasetsSynthetic Pre-Training DataTraining Data PreprocessingData Filtering and CleaningSelecting Quality DocumentsDeduplicationRemoving Personally Identifiable InformationTraining Set DecontaminationData MixturesEffect of Pre-Training Data on Downstream TasksBias and Fairness Issues in Pre-Training DatasetsSummary
VocabularyTokenizersTokenization PipelineNormalizationPre-TokenizationTokenizationByte Pair EncodingWordPieceSpecial TokensSummary
PreliminariesRepresenting MeaningThe Transformer ArchitectureSelf-AttentionPositional EncodingFeedforward NetworksLayer NormalizationLoss FunctionsIntrinsic Model EvaluationTransformer BackbonesEncoder-Only ArchitecturesEncoder-Decoder ArchitecturesDecoder-Only ArchitecturesMixture of ExpertsLearning ObjectivesFull Language ModelingPrefix Language ModelingMasked Language ModelingWhich Learning Objectives Are Better?Pre-Training ModelsSummary
Navigating the LLM LandscapeWho Are the LLM providers?Model FlavorsOpen Source LLMsHow to Choose an LLM for Your TaskOpen Source Versus Proprietary LLMsLLM EvaluationLoading LLMsHugging Face AccelerateOllamaLLM Inference APIsDecoding StrategiesGreedy DecodingBeam SearchTop-k SamplingTop-p SamplingRunning Inference on LLMsStructured OutputsModel Debugging and InterpretabilitySummary
The Need for Fine-TuningFine-Tuning: A Full ExampleLearning Algorithms ParametersMemory Optimization ParametersRegularization ParametersBatch SizeParameter-Efficient Fine-TuningWorking with Reduced PrecisionPutting It All TogetherFine-Tuning DatasetsUtilizing Publicly Available Instruction-Tuning DatasetsLLM-Generated Instruction-Tuning DatasetsSummary
Continual Pre-TrainingReplay (Memory)Parameter ExpansionParameter-Efficient Fine-TuningAdding New ParametersSubset MethodsCombining Multiple ModelsModel EnsemblingModel FusionAdapter MergingSummary

Defining Alignment TrainingReinforcement LearningTypes of Human FeedbackRLHF ExampleHallucinationsMitigating HallucinationsSelf-ConsistencyChain-of-ActionsRecitationSampling Methods for Addressing HallucinationDecoding by Contrasting LayersIn-Context HallucinationsHallucinations Due to Irrelevant InformationReasoningDeductive ReasoningInductive ReasoningAbductive ReasoningCommon Sense ReasoningInducing Reasoning in LLMsVerifiers for Improving ReasoningInference-Time ComputationFine-Tuning for ReasoningSummary
LLM Inference ChallengesInference Optimization TechniquesTechniques for Reducing ComputeK-V CachingEarly ExitKnowledge DistillationTechniques for Accelerating DecodingSpeculative DecodingParallel DecodingTechniques for Reducing Storage NeedsSymmetric QuantizationAsymmetric QuantizationSummary
LLM Interaction ParadigmsPassive ApproachThe Explicit ApproachThe Autonomous ApproachDefining AgentsAgentic WorkflowComponents of an Agentic SystemModelsToolsData StoresAgent Loop PromptGuardrails and VerifiersAgent Orchestration SoftwareSummary
Introduction to EmbeddingsSemantic SearchSimilarity MeasuresFine-Tuning Embedding ModelsBase ModelsTraining DatasetLoss FunctionsInstruction EmbeddingsOptimizing Embedding SizeMatryoshka EmbeddingsBinary and Integer EmbeddingsProduct QuantizationChunkingSliding Window ChunkingMetadata-Aware ChunkingLayout-Aware ChunkingSemantic ChunkingLate ChunkingVector DatabasesInterpreting EmbeddingsSummary
The Need for RAGTypical RAG ScenariosDeciding When to RetrieveThe RAG PipelineRewriteRetrieveRerankRefineInsertGenerateRAG for Memory ManagementRAG for Selecting In-Context Training ExamplesRAG for Model TrainingLimitations of RAGRAG Versus Long ContextRAG Versus Fine-TuningSummary
Multi-LLM ArchitecturesLLM CascadesRoutersTask-Specialized LLMsProgramming ParadigmsDSPyLMQLSummary

Content preview from Designing Large Language Model Applications

Chapter 2. Pre-Training Data

In Chapter 1, we introduced language models, noted their strengths and limitations, explored current and potential use cases, and presented the scaling laws that seemingly govern progress in this field. To set the stage for the rest of this book, in the next three chapters we will discuss in detail the recipe for pre-training LLMs and the ingredients that go into them. But wait, this book is about utilizing pre-trained LLMs to design and build user applications. Why do we need to discuss the nuances of pre-training these gargantuan models from scratch, something most machine learning practitioners are never going to do in their lives?

Actually, this information is very important because many of the decisions made during the pre-training process heavily impact downstream performance. As we will notice in subsequent chapters, failure modes are more easily understandable when you comprehend the training process. Just like we appreciate having ingredients listed on packages at our grocery stores, we would like to know the ingredients that go into making a language model before we use it in serious applications.

Note

Not much information is available in the public realm about some of the proprietary LLMs that are accessible only through an API. This book will provide as much information as has been made public. While the lack of information doesn’t mean that we should avoid using these models, model transparency is something that you might need to consider ...