book

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

September 2024

Beginner to intermediate

428 pages

10h 29m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Includes Quizzes

An Intuition-First PhilosophyPrerequisitesBook StructurePart I: Understanding Language ModelsPart II: Using Pretrained Language ModelsPart III: Training and Fine-Tuning Language ModelsHardware and Software RequirementsAPI KeysConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
What Is Language AI?A Recent History of Language AIRepresenting Language as a Bag-of-WordsBetter Representations with Dense Vector EmbeddingsTypes of EmbeddingsEncoding and Decoding Context with AttentionAttention Is All You NeedRepresentation Models: Encoder-Only ModelsGenerative Models: Decoder-Only ModelsThe Year of Generative AIThe Moving Definition of a “Large Language Model”The Training Paradigm of Large Language ModelsLarge Language Model Applications: What Makes Them So Useful?Responsible LLM Development and UsageLimited Resources Are All You NeedInterfacing with Large Language ModelsProprietary, Private ModelsOpen ModelsOpen Source FrameworksGenerating Your First TextSummary
LLM TokenizationHow Tokenizers Prepare the Inputs to the Language ModelDownloading and Running an LLMHow Does the Tokenizer Break Down Text?Word Versus Subword Versus Character Versus Byte TokensComparing Trained LLM TokenizersTokenizer PropertiesToken EmbeddingsA Language Model Holds Embeddings for the Vocabulary of Its TokenizerCreating Contextualized Word Embeddings with Language ModelsText Embeddings (for Sentences and Whole Documents)Word Embeddings Beyond LLMsUsing pretrained Word EmbeddingsThe Word2vec Algorithm and Contrastive TrainingEmbeddings for Recommendation SystemsRecommending Songs by EmbeddingsTraining a Song Embedding ModelSummary
An Overview of Transformer ModelsThe Inputs and Outputs of a Trained Transformer LLMThe Components of the Forward PassChoosing a Single Token from the Probability Distribution (Sampling/Decoding)Parallel Token Processing and Context SizeSpeeding Up Generation by Caching Keys and ValuesInside the Transformer BlockRecent Improvements to the Transformer ArchitectureMore Efficient AttentionThe Transformer BlockPositional Embeddings (RoPE)Other Architectural Experiments and ImprovementsSummary
The Sentiment of Movie ReviewsText Classification with Representation ModelsModel SelectionUsing a Task-Specific ModelClassification Tasks That Leverage EmbeddingsSupervised ClassificationWhat If We Do Not Have Labeled Data?Text Classification with Generative ModelsUsing the Text-to-Text Transfer TransformerChatGPT for ClassificationSummary
ArXiv’s Articles: Computation and LanguageA Common Pipeline for Text ClusteringEmbedding DocumentsReducing the Dimensionality of EmbeddingsCluster the Reduced EmbeddingsInspecting the ClustersFrom Text Clustering to Topic ModelingBERTopic: A Modular Topic Modeling FrameworkAdding a Special Lego BlockThe Text Generation Lego BlockSummary
Using Text Generation ModelsChoosing a Text Generation ModelLoading a Text Generation ModelControlling Model OutputIntro to Prompt EngineeringThe Basic Ingredients of a PromptInstruction-Based PromptingAdvanced Prompt EngineeringThe Potential Complexity of a PromptIn-Context Learning: Providing ExamplesChain Prompting: Breaking up the ProblemReasoning with Generative ModelsChain-of-Thought: Think Before AnsweringSelf-Consistency: Sampling OutputsTree-of-Thought: Exploring Intermediate StepsOutput VerificationProviding ExamplesGrammar: Constrained SamplingSummary
Model I/O: Loading Quantized Models with LangChainChains: Extending the Capabilities of LLMsA Single Link in the Chain: Prompt TemplateA Chain with Multiple PromptsMemory: Helping LLMs to Remember ConversationsConversation BufferWindowed Conversation BufferConversation SummaryAgents: Creating a System of LLMsThe Driving Power Behind Agents: Step-by-step ReasoningReAct in LangChainSummary

Overview of Semantic Search and RAGSemantic Search with Language ModelsDense RetrievalRerankingRetrieval Evaluation MetricsRetrieval-Augmented Generation (RAG)From Search to RAGExample: Grounded Generation with an LLM APIExample: RAG with Local ModelsAdvanced RAG TechniquesRAG EvaluationSummary
Transformers for VisionMultimodal Embedding ModelsCLIP: Connecting Text and ImagesHow Can CLIP Generate Multimodal Embeddings?OpenCLIPMaking Text Generation Models MultimodalBLIP-2: Bridging the Modality GapPreprocessing Multimodal InputsUse Case 1: Image CaptioningUse Case 2: Multimodal Chat-Based PromptingSummary
Embedding ModelsWhat Is Contrastive Learning?SBERTCreating an Embedding ModelGenerating Contrastive ExamplesTrain ModelIn-Depth EvaluationLoss FunctionsFine-Tuning an Embedding ModelSupervisedAugmented SBERTUnsupervised LearningTransformer-Based Sequential Denoising Auto-EncoderUsing TSDAE for Domain AdaptationSummary
Supervised ClassificationFine-Tuning a Pretrained BERT ModelFreezing LayersFew-Shot ClassificationSetFit: Efficient Fine-Tuning with Few Training ExamplesFine-Tuning for Few-Shot ClassificationContinued Pretraining with Masked Language ModelingNamed-Entity RecognitionPreparing Data for Named-Entity RecognitionFine-Tuning for Named-Entity RecognitionSummary
The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference TuningSupervised Fine-Tuning (SFT)Full Fine-TuningParameter-Efficient Fine-Tuning (PEFT)Instruction Tuning with QLoRATemplating Instruction DataModel QuantizationLoRA ConfigurationTraining ConfigurationTrainingMerge WeightsEvaluating Generative ModelsWord-Level MetricsBenchmarksLeaderboardsAutomated EvaluationHuman EvaluationPreference-Tuning / Alignment / RLHFAutomating Preference Evaluation Using Reward ModelsThe Inputs and Outputs of a Reward ModelTraining a Reward ModelTraining No Reward ModelPreference Tuning with DPOTemplating Alignment DataModel QuantizationTraining ConfigurationTrainingSummary

Content preview from Hands-On Large Language Models

Chapter 12. Fine-Tuning Generation Models

In this chapter, we will take a pretrained text generation model and go over the process of fine-tuning it. This fine-tuning step is key in producing high-quality models and an important tool in our toolbox to adapt a model to a specific desired behavior. Fine-tuning allows us to adapt a model to a specific dataset or domain.

Throughout this chapter, we will guide you among the two most common methods for fine-tuning text generation models, supervised fine-tuning and preference tuning. We will explore the transformative potential of fine-tuning pretrained text generation models to make them more effective tools for your application.

The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference Tuning

There are three common steps that lead to creating a high-quality LLM:

1. Language modeling

The first step in creating a high-quality LLM is to pretrain it on one or more massive text datasets (Figure 12-1). During training, it attempts to predict the next token to accurately learn linguistic and semantic representations found in the text. As we saw before in Chapters 3 and 11, this is called language modeling and is a self-supervised method.

This produces a base model, also commonly referred to as a pretrained or foundation model. Base models are a key artifact of the training process but are harder for the end user to deal with. This is why the next step is important.

Figure 12-1. During language modeling, the LLM ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Build a Large Language Model (From Scratch)

Publisher Resources

ISBN: 9781098150952Errata Page Supplemental Content

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

Chapter 12. Fine-Tuning Generation Models

The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference Tuning

Figure 12-1. During language modeling, the LLM ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Designing Large Language Model Applications

Deep Learning for Coders with fastai and PyTorch

Publisher Resources

Chapter 12. Fine-Tuning Generation Models

The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference Tuning

Figure 12-1. During language modeling, the LLM ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Designing Large Language Model Applications

Deep Learning for Coders with fastai and PyTorch

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.