book

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

September 2024

Beginner to intermediate

428 pages

10h 29m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Includes Quizzes

An Intuition-First PhilosophyPrerequisitesBook StructurePart I: Understanding Language ModelsPart II: Using Pretrained Language ModelsPart III: Training and Fine-Tuning Language ModelsHardware and Software RequirementsAPI KeysConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
What Is Language AI?A Recent History of Language AIRepresenting Language as a Bag-of-WordsBetter Representations with Dense Vector EmbeddingsTypes of EmbeddingsEncoding and Decoding Context with AttentionAttention Is All You NeedRepresentation Models: Encoder-Only ModelsGenerative Models: Decoder-Only ModelsThe Year of Generative AIThe Moving Definition of a “Large Language Model”The Training Paradigm of Large Language ModelsLarge Language Model Applications: What Makes Them So Useful?Responsible LLM Development and UsageLimited Resources Are All You NeedInterfacing with Large Language ModelsProprietary, Private ModelsOpen ModelsOpen Source FrameworksGenerating Your First TextSummary
LLM TokenizationHow Tokenizers Prepare the Inputs to the Language ModelDownloading and Running an LLMHow Does the Tokenizer Break Down Text?Word Versus Subword Versus Character Versus Byte TokensComparing Trained LLM TokenizersTokenizer PropertiesToken EmbeddingsA Language Model Holds Embeddings for the Vocabulary of Its TokenizerCreating Contextualized Word Embeddings with Language ModelsText Embeddings (for Sentences and Whole Documents)Word Embeddings Beyond LLMsUsing pretrained Word EmbeddingsThe Word2vec Algorithm and Contrastive TrainingEmbeddings for Recommendation SystemsRecommending Songs by EmbeddingsTraining a Song Embedding ModelSummary
An Overview of Transformer ModelsThe Inputs and Outputs of a Trained Transformer LLMThe Components of the Forward PassChoosing a Single Token from the Probability Distribution (Sampling/Decoding)Parallel Token Processing and Context SizeSpeeding Up Generation by Caching Keys and ValuesInside the Transformer BlockRecent Improvements to the Transformer ArchitectureMore Efficient AttentionThe Transformer BlockPositional Embeddings (RoPE)Other Architectural Experiments and ImprovementsSummary
The Sentiment of Movie ReviewsText Classification with Representation ModelsModel SelectionUsing a Task-Specific ModelClassification Tasks That Leverage EmbeddingsSupervised ClassificationWhat If We Do Not Have Labeled Data?Text Classification with Generative ModelsUsing the Text-to-Text Transfer TransformerChatGPT for ClassificationSummary
ArXiv’s Articles: Computation and LanguageA Common Pipeline for Text ClusteringEmbedding DocumentsReducing the Dimensionality of EmbeddingsCluster the Reduced EmbeddingsInspecting the ClustersFrom Text Clustering to Topic ModelingBERTopic: A Modular Topic Modeling FrameworkAdding a Special Lego BlockThe Text Generation Lego BlockSummary
Using Text Generation ModelsChoosing a Text Generation ModelLoading a Text Generation ModelControlling Model OutputIntro to Prompt EngineeringThe Basic Ingredients of a PromptInstruction-Based PromptingAdvanced Prompt EngineeringThe Potential Complexity of a PromptIn-Context Learning: Providing ExamplesChain Prompting: Breaking up the ProblemReasoning with Generative ModelsChain-of-Thought: Think Before AnsweringSelf-Consistency: Sampling OutputsTree-of-Thought: Exploring Intermediate StepsOutput VerificationProviding ExamplesGrammar: Constrained SamplingSummary
Model I/O: Loading Quantized Models with LangChainChains: Extending the Capabilities of LLMsA Single Link in the Chain: Prompt TemplateA Chain with Multiple PromptsMemory: Helping LLMs to Remember ConversationsConversation BufferWindowed Conversation BufferConversation SummaryAgents: Creating a System of LLMsThe Driving Power Behind Agents: Step-by-step ReasoningReAct in LangChainSummary

Overview of Semantic Search and RAGSemantic Search with Language ModelsDense RetrievalRerankingRetrieval Evaluation MetricsRetrieval-Augmented Generation (RAG)From Search to RAGExample: Grounded Generation with an LLM APIExample: RAG with Local ModelsAdvanced RAG TechniquesRAG EvaluationSummary
Transformers for VisionMultimodal Embedding ModelsCLIP: Connecting Text and ImagesHow Can CLIP Generate Multimodal Embeddings?OpenCLIPMaking Text Generation Models MultimodalBLIP-2: Bridging the Modality GapPreprocessing Multimodal InputsUse Case 1: Image CaptioningUse Case 2: Multimodal Chat-Based PromptingSummary
Embedding ModelsWhat Is Contrastive Learning?SBERTCreating an Embedding ModelGenerating Contrastive ExamplesTrain ModelIn-Depth EvaluationLoss FunctionsFine-Tuning an Embedding ModelSupervisedAugmented SBERTUnsupervised LearningTransformer-Based Sequential Denoising Auto-EncoderUsing TSDAE for Domain AdaptationSummary
Supervised ClassificationFine-Tuning a Pretrained BERT ModelFreezing LayersFew-Shot ClassificationSetFit: Efficient Fine-Tuning with Few Training ExamplesFine-Tuning for Few-Shot ClassificationContinued Pretraining with Masked Language ModelingNamed-Entity RecognitionPreparing Data for Named-Entity RecognitionFine-Tuning for Named-Entity RecognitionSummary
The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference TuningSupervised Fine-Tuning (SFT)Full Fine-TuningParameter-Efficient Fine-Tuning (PEFT)Instruction Tuning with QLoRATemplating Instruction DataModel QuantizationLoRA ConfigurationTraining ConfigurationTrainingMerge WeightsEvaluating Generative ModelsWord-Level MetricsBenchmarksLeaderboardsAutomated EvaluationHuman EvaluationPreference-Tuning / Alignment / RLHFAutomating Preference Evaluation Using Reward ModelsThe Inputs and Outputs of a Reward ModelTraining a Reward ModelTraining No Reward ModelPreference Tuning with DPOTemplating Alignment DataModel QuantizationTraining ConfigurationTrainingSummary

Content preview from Hands-On Large Language Models

Chapter 2. Tokens and Embeddings

Tokens and embeddings are two of the central concepts of using large language models (LLMs). As we’ve seen in the first chapter, they’re not only important to understanding the history of Language AI, but we cannot have a clear sense of how LLMs work, how they’re built, and where they will go in the future without a good sense of tokens and embeddings, as we can see in Figure 2-1.

In this chapter, we look more closely at what tokens are and the tokenization methods used to power LLMs. We will then dive into the famous word2vec embedding method that preceded modern-day LLMs and see how it’s extending the concept of token embeddings to build commercial recommendation systems that power a lot of the apps you use. Finally, we go from token embeddings into sentence or text embeddings, where a whole sentence or document can have one vector that represents it—enabling applications like semantic search and topic modeling that we see in Part II of this book.

LLM Tokenization

The way the majority of people interact with language models, at the time of this writing, is through a web playground that presents a chat interface between the user and a language model. You may ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Build a Large Language Model (From Scratch)

Publisher Resources

ISBN: 9781098150952Errata Page Supplemental Content

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

Chapter 2. Tokens and Embeddings

Figure 2-1. Language models deal with text in small chunks called tokens. For the language model to compute language, it needs to turn tokens into numeric representations called embeddings.

LLM Tokenization

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Designing Large Language Model Applications

Deep Learning for Coders with fastai and PyTorch

Publisher Resources

Chapter 2. Tokens and Embeddings

Figure 2-1. Language models deal with text in small chunks called tokens. For the language model to compute language, it needs to turn tokens into numeric representations called embeddings.

LLM Tokenization

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Designing Large Language Model Applications

Deep Learning for Coders with fastai and PyTorch

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.