book

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

September 2024

Beginner to intermediate

428 pages

10h 29m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Includes Quizzes

An Intuition-First PhilosophyPrerequisitesBook StructurePart I: Understanding Language ModelsPart II: Using Pretrained Language ModelsPart III: Training and Fine-Tuning Language ModelsHardware and Software RequirementsAPI KeysConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
What Is Language AI?A Recent History of Language AIRepresenting Language as a Bag-of-WordsBetter Representations with Dense Vector EmbeddingsTypes of EmbeddingsEncoding and Decoding Context with AttentionAttention Is All You NeedRepresentation Models: Encoder-Only ModelsGenerative Models: Decoder-Only ModelsThe Year of Generative AIThe Moving Definition of a “Large Language Model”The Training Paradigm of Large Language ModelsLarge Language Model Applications: What Makes Them So Useful?Responsible LLM Development and UsageLimited Resources Are All You NeedInterfacing with Large Language ModelsProprietary, Private ModelsOpen ModelsOpen Source FrameworksGenerating Your First TextSummary
LLM TokenizationHow Tokenizers Prepare the Inputs to the Language ModelDownloading and Running an LLMHow Does the Tokenizer Break Down Text?Word Versus Subword Versus Character Versus Byte TokensComparing Trained LLM TokenizersTokenizer PropertiesToken EmbeddingsA Language Model Holds Embeddings for the Vocabulary of Its TokenizerCreating Contextualized Word Embeddings with Language ModelsText Embeddings (for Sentences and Whole Documents)Word Embeddings Beyond LLMsUsing pretrained Word EmbeddingsThe Word2vec Algorithm and Contrastive TrainingEmbeddings for Recommendation SystemsRecommending Songs by EmbeddingsTraining a Song Embedding ModelSummary
An Overview of Transformer ModelsThe Inputs and Outputs of a Trained Transformer LLMThe Components of the Forward PassChoosing a Single Token from the Probability Distribution (Sampling/Decoding)Parallel Token Processing and Context SizeSpeeding Up Generation by Caching Keys and ValuesInside the Transformer BlockRecent Improvements to the Transformer ArchitectureMore Efficient AttentionThe Transformer BlockPositional Embeddings (RoPE)Other Architectural Experiments and ImprovementsSummary
The Sentiment of Movie ReviewsText Classification with Representation ModelsModel SelectionUsing a Task-Specific ModelClassification Tasks That Leverage EmbeddingsSupervised ClassificationWhat If We Do Not Have Labeled Data?Text Classification with Generative ModelsUsing the Text-to-Text Transfer TransformerChatGPT for ClassificationSummary
ArXiv’s Articles: Computation and LanguageA Common Pipeline for Text ClusteringEmbedding DocumentsReducing the Dimensionality of EmbeddingsCluster the Reduced EmbeddingsInspecting the ClustersFrom Text Clustering to Topic ModelingBERTopic: A Modular Topic Modeling FrameworkAdding a Special Lego BlockThe Text Generation Lego BlockSummary
Using Text Generation ModelsChoosing a Text Generation ModelLoading a Text Generation ModelControlling Model OutputIntro to Prompt EngineeringThe Basic Ingredients of a PromptInstruction-Based PromptingAdvanced Prompt EngineeringThe Potential Complexity of a PromptIn-Context Learning: Providing ExamplesChain Prompting: Breaking up the ProblemReasoning with Generative ModelsChain-of-Thought: Think Before AnsweringSelf-Consistency: Sampling OutputsTree-of-Thought: Exploring Intermediate StepsOutput VerificationProviding ExamplesGrammar: Constrained SamplingSummary
Model I/O: Loading Quantized Models with LangChainChains: Extending the Capabilities of LLMsA Single Link in the Chain: Prompt TemplateA Chain with Multiple PromptsMemory: Helping LLMs to Remember ConversationsConversation BufferWindowed Conversation BufferConversation SummaryAgents: Creating a System of LLMsThe Driving Power Behind Agents: Step-by-step ReasoningReAct in LangChainSummary

Overview of Semantic Search and RAGSemantic Search with Language ModelsDense RetrievalRerankingRetrieval Evaluation MetricsRetrieval-Augmented Generation (RAG)From Search to RAGExample: Grounded Generation with an LLM APIExample: RAG with Local ModelsAdvanced RAG TechniquesRAG EvaluationSummary
Transformers for VisionMultimodal Embedding ModelsCLIP: Connecting Text and ImagesHow Can CLIP Generate Multimodal Embeddings?OpenCLIPMaking Text Generation Models MultimodalBLIP-2: Bridging the Modality GapPreprocessing Multimodal InputsUse Case 1: Image CaptioningUse Case 2: Multimodal Chat-Based PromptingSummary
Embedding ModelsWhat Is Contrastive Learning?SBERTCreating an Embedding ModelGenerating Contrastive ExamplesTrain ModelIn-Depth EvaluationLoss FunctionsFine-Tuning an Embedding ModelSupervisedAugmented SBERTUnsupervised LearningTransformer-Based Sequential Denoising Auto-EncoderUsing TSDAE for Domain AdaptationSummary
Supervised ClassificationFine-Tuning a Pretrained BERT ModelFreezing LayersFew-Shot ClassificationSetFit: Efficient Fine-Tuning with Few Training ExamplesFine-Tuning for Few-Shot ClassificationContinued Pretraining with Masked Language ModelingNamed-Entity RecognitionPreparing Data for Named-Entity RecognitionFine-Tuning for Named-Entity RecognitionSummary
The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference TuningSupervised Fine-Tuning (SFT)Full Fine-TuningParameter-Efficient Fine-Tuning (PEFT)Instruction Tuning with QLoRATemplating Instruction DataModel QuantizationLoRA ConfigurationTraining ConfigurationTrainingMerge WeightsEvaluating Generative ModelsWord-Level MetricsBenchmarksLeaderboardsAutomated EvaluationHuman EvaluationPreference-Tuning / Alignment / RLHFAutomating Preference Evaluation Using Reward ModelsThe Inputs and Outputs of a Reward ModelTraining a Reward ModelTraining No Reward ModelPreference Tuning with DPOTemplating Alignment DataModel QuantizationTraining ConfigurationTrainingSummary

Content preview from Hands-On Large Language Models

Chapter 1. An Introduction to Large Language Models

Humanity is at an inflection point. From 2012 onwards, developments in building AI systems (using deep neural networks) accelerated so that by the end of the decade, they yielded the first software system able to write articles indiscernible from those written by humans. This system was an AI model called Generative Pre-trained Transformer 2, or GPT-2. 2022 marked the release of ChatGPT, which demonstrated how profoundly this technology was poised to revolutionize how we interact with technology and information. Reaching one million active users in five days and then one hundred million active users in two months, the new breed of AI models started out as human-like chatbots but quickly evolved into a monumental shift in our approach to common tasks, like translation, text generation, summarization, and more. It became an invaluable tool for programmers, educators, and researchers.

The success of ChatGPT was unprecedented and popularized more research into the technology behind it, namely large language models (LLMs). Both proprietary and public models were being released at a steady pace, closing in on, and eventually catching up to the performance of ChatGPT. It is not an exaggeration to state that almost all attention was on LLMs.

As a result, 2023 will always be known, at least to us, as the year that drastically changed our field, Language Artificial Intelligence (Language AI), a field characterized by the development of ...