book

Designing Large Language Model Applications

by Suhas Pai

March 2025

Intermediate to advanced

366 pages

9h 31m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Includes Quizzes

Who This Book Is ForHow This Book Is StructuredWhat This Book Is Not AboutHow to Read the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
Defining LLMsA Brief History of LLMsEarly YearsThe Modern LLM EraThe Impact of LLMsLLM Usage in the EnterprisePromptingZero-Shot PromptingFew-Shot PromptingChain-of-Thought PromptingPrompt ChainingAdversarial PromptingAccessing LLMs Through an APIStrengths and Limitations of LLMsBuilding Your First Chatbot PrototypeFrom Prototype to ProductionSummary
Ingredients of an LLMPre-Training Data RequirementsPopular Pre-Training DatasetsSynthetic Pre-Training DataTraining Data PreprocessingData Filtering and CleaningSelecting Quality DocumentsDeduplicationRemoving Personally Identifiable InformationTraining Set DecontaminationData MixturesEffect of Pre-Training Data on Downstream TasksBias and Fairness Issues in Pre-Training DatasetsSummary
VocabularyTokenizersTokenization PipelineNormalizationPre-TokenizationTokenizationByte Pair EncodingWordPieceSpecial TokensSummary
PreliminariesRepresenting MeaningThe Transformer ArchitectureSelf-AttentionPositional EncodingFeedforward NetworksLayer NormalizationLoss FunctionsIntrinsic Model EvaluationTransformer BackbonesEncoder-Only ArchitecturesEncoder-Decoder ArchitecturesDecoder-Only ArchitecturesMixture of ExpertsLearning ObjectivesFull Language ModelingPrefix Language ModelingMasked Language ModelingWhich Learning Objectives Are Better?Pre-Training ModelsSummary
Navigating the LLM LandscapeWho Are the LLM providers?Model FlavorsOpen Source LLMsHow to Choose an LLM for Your TaskOpen Source Versus Proprietary LLMsLLM EvaluationLoading LLMsHugging Face AccelerateOllamaLLM Inference APIsDecoding StrategiesGreedy DecodingBeam SearchTop-k SamplingTop-p SamplingRunning Inference on LLMsStructured OutputsModel Debugging and InterpretabilitySummary
The Need for Fine-TuningFine-Tuning: A Full ExampleLearning Algorithms ParametersMemory Optimization ParametersRegularization ParametersBatch SizeParameter-Efficient Fine-TuningWorking with Reduced PrecisionPutting It All TogetherFine-Tuning DatasetsUtilizing Publicly Available Instruction-Tuning DatasetsLLM-Generated Instruction-Tuning DatasetsSummary
Continual Pre-TrainingReplay (Memory)Parameter ExpansionParameter-Efficient Fine-TuningAdding New ParametersSubset MethodsCombining Multiple ModelsModel EnsemblingModel FusionAdapter MergingSummary

Defining Alignment TrainingReinforcement LearningTypes of Human FeedbackRLHF ExampleHallucinationsMitigating HallucinationsSelf-ConsistencyChain-of-ActionsRecitationSampling Methods for Addressing HallucinationDecoding by Contrasting LayersIn-Context HallucinationsHallucinations Due to Irrelevant InformationReasoningDeductive ReasoningInductive ReasoningAbductive ReasoningCommon Sense ReasoningInducing Reasoning in LLMsVerifiers for Improving ReasoningInference-Time ComputationFine-Tuning for ReasoningSummary
LLM Inference ChallengesInference Optimization TechniquesTechniques for Reducing ComputeK-V CachingEarly ExitKnowledge DistillationTechniques for Accelerating DecodingSpeculative DecodingParallel DecodingTechniques for Reducing Storage NeedsSymmetric QuantizationAsymmetric QuantizationSummary
LLM Interaction ParadigmsPassive ApproachThe Explicit ApproachThe Autonomous ApproachDefining AgentsAgentic WorkflowComponents of an Agentic SystemModelsToolsData StoresAgent Loop PromptGuardrails and VerifiersAgent Orchestration SoftwareSummary
Introduction to EmbeddingsSemantic SearchSimilarity MeasuresFine-Tuning Embedding ModelsBase ModelsTraining DatasetLoss FunctionsInstruction EmbeddingsOptimizing Embedding SizeMatryoshka EmbeddingsBinary and Integer EmbeddingsProduct QuantizationChunkingSliding Window ChunkingMetadata-Aware ChunkingLayout-Aware ChunkingSemantic ChunkingLate ChunkingVector DatabasesInterpreting EmbeddingsSummary
The Need for RAGTypical RAG ScenariosDeciding When to RetrieveThe RAG PipelineRewriteRetrieveRerankRefineInsertGenerateRAG for Memory ManagementRAG for Selecting In-Context Training ExamplesRAG for Model TrainingLimitations of RAGRAG Versus Long ContextRAG Versus Fine-TuningSummary
Multi-LLM ArchitecturesLLM CascadesRoutersTask-Specialized LLMsProgramming ParadigmsDSPyLMQLSummary

Content preview from Designing Large Language Model Applications

Chapter 11. Representation Learning and Embeddings

In the previous chapter, we learned how we can interface language models with external tools, including data stores. External data can be present in the form of text files, database tables, and knowledge graphs. Data can span a wide variety of content types, from proprietary domain-specific knowledge bases to intermediate results and outputs generated by LLMs.

If the data are structured, for example residing in a relational database, the language model can issue a SQL query to retrieve the data it needs. But what if the data are present in unstructured form?

One way to retrieve data from unstructured text datasets is to search by keywords or use regular expressions. For the Apple CFO example in the previous chapter, we can retrieve text containing CFO mentions from a corpus containing financial disclosures, hoping that it will contain the join date or tenure information. For instance, you can use the regex:

pattern = r"(?i)\b(?:C\.?F\.?O|Chief\s+Financial\s+Officer)\b"

Keyword search is limited in its effectiveness. There are a very large number of ways to express CFO join date or tenure in a corpus, if it is present at all. Trying to use a catch-all regex like the above could result in a large proportion of false positives.

Therefore, we need to move beyond keyword search. Over the last few decades, the field of information retrieval has developed several methods like BM25 that have shaped search systems. We will learn more about ...