book

Designing Large Language Model Applications

Name: Designing Large Language Model Applications
Author: Suhas Pai
ISBN: 9781098150501

by Suhas Pai

March 2025

Intermediate to advanced

366 pages

9h 31m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Preface
Who This Book Is ForHow This Book Is StructuredWhat This Book Is Not AboutHow to Read the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. LLM Ingredients
1. Introduction
Defining LLMsA Brief History of LLMsEarly YearsThe Modern LLM EraThe Impact of LLMsLLM Usage in the EnterprisePromptingZero-Shot PromptingFew-Shot PromptingChain-of-Thought PromptingPrompt ChainingAdversarial PromptingAccessing LLMs Through an APIStrengths and Limitations of LLMsBuilding Your First Chatbot PrototypeFrom Prototype to ProductionSummary
2. Pre-Training Data
Ingredients of an LLMPre-Training Data RequirementsPopular Pre-Training DatasetsSynthetic Pre-Training DataTraining Data PreprocessingData Filtering and CleaningSelecting Quality DocumentsDeduplicationRemoving Personally Identifiable InformationTraining Set DecontaminationData MixturesEffect of Pre-Training Data on Downstream TasksBias and Fairness Issues in Pre-Training DatasetsSummary
3. Vocabulary and Tokenization
VocabularyTokenizersTokenization PipelineNormalizationPre-TokenizationTokenizationByte Pair EncodingWordPieceSpecial TokensSummary
4. Architectures and Learning Objectives
PreliminariesRepresenting MeaningThe Transformer ArchitectureSelf-AttentionPositional EncodingFeedforward NetworksLayer NormalizationLoss FunctionsIntrinsic Model EvaluationTransformer BackbonesEncoder-Only ArchitecturesEncoder-Decoder ArchitecturesDecoder-Only ArchitecturesMixture of ExpertsLearning ObjectivesFull Language ModelingPrefix Language ModelingMasked Language ModelingWhich Learning Objectives Are Better?Pre-Training ModelsSummary
II. Utilizing LLMs
5. Adapting LLMs to Your Use Case
Navigating the LLM LandscapeWho Are the LLM providers?Model FlavorsOpen Source LLMsHow to Choose an LLM for Your TaskOpen Source Versus Proprietary LLMsLLM EvaluationLoading LLMsHugging Face AccelerateOllamaLLM Inference APIsDecoding StrategiesGreedy DecodingBeam SearchTop-k SamplingTop-p SamplingRunning Inference on LLMsStructured OutputsModel Debugging and InterpretabilitySummary
6. Fine-Tuning
The Need for Fine-TuningFine-Tuning: A Full ExampleLearning Algorithms ParametersMemory Optimization ParametersRegularization ParametersBatch SizeParameter-Efficient Fine-TuningWorking with Reduced PrecisionPutting It All TogetherFine-Tuning DatasetsUtilizing Publicly Available Instruction-Tuning DatasetsLLM-Generated Instruction-Tuning DatasetsSummary
7. Advanced Fine-Tuning Techniques
Continual Pre-TrainingReplay (Memory)Parameter ExpansionParameter-Efficient Fine-TuningAdding New ParametersSubset MethodsCombining Multiple ModelsModel EnsemblingModel FusionAdapter MergingSummary

8. Alignment Training and Reasoning
Defining Alignment TrainingReinforcement LearningTypes of Human FeedbackRLHF ExampleHallucinationsMitigating HallucinationsSelf-ConsistencyChain-of-ActionsRecitationSampling Methods for Addressing HallucinationDecoding by Contrasting LayersIn-Context HallucinationsHallucinations Due to Irrelevant InformationReasoningDeductive ReasoningInductive ReasoningAbductive ReasoningCommon Sense ReasoningInducing Reasoning in LLMsVerifiers for Improving ReasoningInference-Time ComputationFine-Tuning for ReasoningSummary
9. Inference Optimization
LLM Inference ChallengesInference Optimization TechniquesTechniques for Reducing ComputeK-V CachingEarly ExitKnowledge DistillationTechniques for Accelerating DecodingSpeculative DecodingParallel DecodingTechniques for Reducing Storage NeedsSymmetric QuantizationAsymmetric QuantizationSummary
III. LLM Application Paradigms
10. Interfacing LLMs with External Tools
LLM Interaction ParadigmsPassive ApproachThe Explicit ApproachThe Autonomous ApproachDefining AgentsAgentic WorkflowComponents of an Agentic SystemModelsToolsData StoresAgent Loop PromptGuardrails and VerifiersAgent Orchestration SoftwareSummary
11. Representation Learning and Embeddings
Introduction to EmbeddingsSemantic SearchSimilarity MeasuresFine-Tuning Embedding ModelsBase ModelsTraining DatasetLoss FunctionsInstruction EmbeddingsOptimizing Embedding SizeMatryoshka EmbeddingsBinary and Integer EmbeddingsProduct QuantizationChunkingSliding Window ChunkingMetadata-Aware ChunkingLayout-Aware ChunkingSemantic ChunkingLate ChunkingVector DatabasesInterpreting EmbeddingsSummary
12. Retrieval-Augmented Generation
The Need for RAGTypical RAG ScenariosDeciding When to RetrieveThe RAG PipelineRewriteRetrieveRerankRefineInsertGenerateRAG for Memory ManagementRAG for Selecting In-Context Training ExamplesRAG for Model TrainingLimitations of RAGRAG Versus Long ContextRAG Versus Fine-TuningSummary
13. Design Patterns and System Architecture
Multi-LLM ArchitecturesLLM CascadesRoutersTask-Specialized LLMsProgramming ParadigmsDSPyLMQLSummary
Index
About the Author

Content preview from Designing Large Language Model Applications

Chapter 4. Architectures and Learning Objectives

In Chapters 2 and 3, we discussed some of the key ingredients that go into making a language model: the training datasets, and the vocabulary and tokenizer. Next, let’s complete the puzzle by learning about the models themselves, the architectures underpinning them, and their learning objectives.

In this chapter, we will learn the composition of language models and their structure. Modern-day language models are predominantly based on the Transformer architecture, and hence we will devote most of our focus to understanding it, by going through each component of the architecture in detail. Over the last few years, several variants and alternatives to the original Transformer architecture have been proposed. We will go through the promising ones, including Mixture of Experts (MoE) models. We will also examine commonly used learning objectives the language models are trained over, including next-token prediction. Finally, we will bring together the concepts of the last three chapters in practice by learning how to pre-train a language model from scratch.

Preliminaries

Just about every contemporary language model is based on neural networks, composed of processing units called neurons. While modern neural networks do not resemble the workings of the human brain at all, many of the ideas behind neural networks and the terminology used is inspired by the field of neuroscience.

The neurons in a neural network are connected to each other ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098150495Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Designing Large Language Model Applications

by Suhas Pai

Chapter 4. Architectures and Learning Objectives

Preliminaries

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.