book

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

September 2024

Beginner to intermediate

428 pages

10h 29m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Includes Quizzes

An Intuition-First PhilosophyPrerequisitesBook StructurePart I: Understanding Language ModelsPart II: Using Pretrained Language ModelsPart III: Training and Fine-Tuning Language ModelsHardware and Software RequirementsAPI KeysConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
What Is Language AI?A Recent History of Language AIRepresenting Language as a Bag-of-WordsBetter Representations with Dense Vector EmbeddingsTypes of EmbeddingsEncoding and Decoding Context with AttentionAttention Is All You NeedRepresentation Models: Encoder-Only ModelsGenerative Models: Decoder-Only ModelsThe Year of Generative AIThe Moving Definition of a “Large Language Model”The Training Paradigm of Large Language ModelsLarge Language Model Applications: What Makes Them So Useful?Responsible LLM Development and UsageLimited Resources Are All You NeedInterfacing with Large Language ModelsProprietary, Private ModelsOpen ModelsOpen Source FrameworksGenerating Your First TextSummary
LLM TokenizationHow Tokenizers Prepare the Inputs to the Language ModelDownloading and Running an LLMHow Does the Tokenizer Break Down Text?Word Versus Subword Versus Character Versus Byte TokensComparing Trained LLM TokenizersTokenizer PropertiesToken EmbeddingsA Language Model Holds Embeddings for the Vocabulary of Its TokenizerCreating Contextualized Word Embeddings with Language ModelsText Embeddings (for Sentences and Whole Documents)Word Embeddings Beyond LLMsUsing pretrained Word EmbeddingsThe Word2vec Algorithm and Contrastive TrainingEmbeddings for Recommendation SystemsRecommending Songs by EmbeddingsTraining a Song Embedding ModelSummary
An Overview of Transformer ModelsThe Inputs and Outputs of a Trained Transformer LLMThe Components of the Forward PassChoosing a Single Token from the Probability Distribution (Sampling/Decoding)Parallel Token Processing and Context SizeSpeeding Up Generation by Caching Keys and ValuesInside the Transformer BlockRecent Improvements to the Transformer ArchitectureMore Efficient AttentionThe Transformer BlockPositional Embeddings (RoPE)Other Architectural Experiments and ImprovementsSummary
The Sentiment of Movie ReviewsText Classification with Representation ModelsModel SelectionUsing a Task-Specific ModelClassification Tasks That Leverage EmbeddingsSupervised ClassificationWhat If We Do Not Have Labeled Data?Text Classification with Generative ModelsUsing the Text-to-Text Transfer TransformerChatGPT for ClassificationSummary
ArXiv’s Articles: Computation and LanguageA Common Pipeline for Text ClusteringEmbedding DocumentsReducing the Dimensionality of EmbeddingsCluster the Reduced EmbeddingsInspecting the ClustersFrom Text Clustering to Topic ModelingBERTopic: A Modular Topic Modeling FrameworkAdding a Special Lego BlockThe Text Generation Lego BlockSummary
Using Text Generation ModelsChoosing a Text Generation ModelLoading a Text Generation ModelControlling Model OutputIntro to Prompt EngineeringThe Basic Ingredients of a PromptInstruction-Based PromptingAdvanced Prompt EngineeringThe Potential Complexity of a PromptIn-Context Learning: Providing ExamplesChain Prompting: Breaking up the ProblemReasoning with Generative ModelsChain-of-Thought: Think Before AnsweringSelf-Consistency: Sampling OutputsTree-of-Thought: Exploring Intermediate StepsOutput VerificationProviding ExamplesGrammar: Constrained SamplingSummary
Model I/O: Loading Quantized Models with LangChainChains: Extending the Capabilities of LLMsA Single Link in the Chain: Prompt TemplateA Chain with Multiple PromptsMemory: Helping LLMs to Remember ConversationsConversation BufferWindowed Conversation BufferConversation SummaryAgents: Creating a System of LLMsThe Driving Power Behind Agents: Step-by-step ReasoningReAct in LangChainSummary

Overview of Semantic Search and RAGSemantic Search with Language ModelsDense RetrievalRerankingRetrieval Evaluation MetricsRetrieval-Augmented Generation (RAG)From Search to RAGExample: Grounded Generation with an LLM APIExample: RAG with Local ModelsAdvanced RAG TechniquesRAG EvaluationSummary
Transformers for VisionMultimodal Embedding ModelsCLIP: Connecting Text and ImagesHow Can CLIP Generate Multimodal Embeddings?OpenCLIPMaking Text Generation Models MultimodalBLIP-2: Bridging the Modality GapPreprocessing Multimodal InputsUse Case 1: Image CaptioningUse Case 2: Multimodal Chat-Based PromptingSummary
Embedding ModelsWhat Is Contrastive Learning?SBERTCreating an Embedding ModelGenerating Contrastive ExamplesTrain ModelIn-Depth EvaluationLoss FunctionsFine-Tuning an Embedding ModelSupervisedAugmented SBERTUnsupervised LearningTransformer-Based Sequential Denoising Auto-EncoderUsing TSDAE for Domain AdaptationSummary
Supervised ClassificationFine-Tuning a Pretrained BERT ModelFreezing LayersFew-Shot ClassificationSetFit: Efficient Fine-Tuning with Few Training ExamplesFine-Tuning for Few-Shot ClassificationContinued Pretraining with Masked Language ModelingNamed-Entity RecognitionPreparing Data for Named-Entity RecognitionFine-Tuning for Named-Entity RecognitionSummary
The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference TuningSupervised Fine-Tuning (SFT)Full Fine-TuningParameter-Efficient Fine-Tuning (PEFT)Instruction Tuning with QLoRATemplating Instruction DataModel QuantizationLoRA ConfigurationTraining ConfigurationTrainingMerge WeightsEvaluating Generative ModelsWord-Level MetricsBenchmarksLeaderboardsAutomated EvaluationHuman EvaluationPreference-Tuning / Alignment / RLHFAutomating Preference Evaluation Using Reward ModelsThe Inputs and Outputs of a Reward ModelTraining a Reward ModelTraining No Reward ModelPreference Tuning with DPOTemplating Alignment DataModel QuantizationTraining ConfigurationTrainingSummary

Content preview from Hands-On Large Language Models

Chapter 3. Looking Inside Large Language Models

Now that we have a sense of tokenization and embeddings, we’re ready to dive deeper into the language model and see how it works. In this chapter, we’ll look at some of the main intuitions of how Transformer language models work. Our focus will be on text generation models so we get a deeper sense for generative LLMs in particular.

We’ll be looking at both the concepts and some code examples that demonstrate them. Let’s start by loading a language model and getting it ready for generation by declaring a pipeline. In your first read, feel free to skip the code and focus on grasping the concepts involved. Then in a second read, the code will get you to start applying these concepts.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=50,
    do_sample=False,
)

An Overview of Transformer Models

Let’s begin our exploration with a high-level overview of the model, and then we’ll see how later work has improved upon the Transformer model since its introduction in 2017.