book

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

September 2024

Beginner to intermediate

428 pages

10h 29m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Includes

Includes Quizzes

An Intuition-First PhilosophyPrerequisitesBook StructurePart I: Understanding Language ModelsPart II: Using Pretrained Language ModelsPart III: Training and Fine-Tuning Language ModelsHardware and Software RequirementsAPI KeysConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
What Is Language AI?A Recent History of Language AIRepresenting Language as a Bag-of-WordsBetter Representations with Dense Vector EmbeddingsTypes of EmbeddingsEncoding and Decoding Context with AttentionAttention Is All You NeedRepresentation Models: Encoder-Only ModelsGenerative Models: Decoder-Only ModelsThe Year of Generative AIThe Moving Definition of a “Large Language Model”The Training Paradigm of Large Language ModelsLarge Language Model Applications: What Makes Them So Useful?Responsible LLM Development and UsageLimited Resources Are All You NeedInterfacing with Large Language ModelsProprietary, Private ModelsOpen ModelsOpen Source FrameworksGenerating Your First TextSummary
LLM TokenizationHow Tokenizers Prepare the Inputs to the Language ModelDownloading and Running an LLMHow Does the Tokenizer Break Down Text?Word Versus Subword Versus Character Versus Byte TokensComparing Trained LLM TokenizersTokenizer PropertiesToken EmbeddingsA Language Model Holds Embeddings for the Vocabulary of Its TokenizerCreating Contextualized Word Embeddings with Language ModelsText Embeddings (for Sentences and Whole Documents)Word Embeddings Beyond LLMsUsing pretrained Word EmbeddingsThe Word2vec Algorithm and Contrastive TrainingEmbeddings for Recommendation SystemsRecommending Songs by EmbeddingsTraining a Song Embedding ModelSummary
An Overview of Transformer ModelsThe Inputs and Outputs of a Trained Transformer LLMThe Components of the Forward PassChoosing a Single Token from the Probability Distribution (Sampling/Decoding)Parallel Token Processing and Context SizeSpeeding Up Generation by Caching Keys and ValuesInside the Transformer BlockRecent Improvements to the Transformer ArchitectureMore Efficient AttentionThe Transformer BlockPositional Embeddings (RoPE)Other Architectural Experiments and ImprovementsSummary
The Sentiment of Movie ReviewsText Classification with Representation ModelsModel SelectionUsing a Task-Specific ModelClassification Tasks That Leverage EmbeddingsSupervised ClassificationWhat If We Do Not Have Labeled Data?Text Classification with Generative ModelsUsing the Text-to-Text Transfer TransformerChatGPT for ClassificationSummary
ArXiv’s Articles: Computation and LanguageA Common Pipeline for Text ClusteringEmbedding DocumentsReducing the Dimensionality of EmbeddingsCluster the Reduced EmbeddingsInspecting the ClustersFrom Text Clustering to Topic ModelingBERTopic: A Modular Topic Modeling FrameworkAdding a Special Lego BlockThe Text Generation Lego BlockSummary
Using Text Generation ModelsChoosing a Text Generation ModelLoading a Text Generation ModelControlling Model OutputIntro to Prompt EngineeringThe Basic Ingredients of a PromptInstruction-Based PromptingAdvanced Prompt EngineeringThe Potential Complexity of a PromptIn-Context Learning: Providing ExamplesChain Prompting: Breaking up the ProblemReasoning with Generative ModelsChain-of-Thought: Think Before AnsweringSelf-Consistency: Sampling OutputsTree-of-Thought: Exploring Intermediate StepsOutput VerificationProviding ExamplesGrammar: Constrained SamplingSummary
Model I/O: Loading Quantized Models with LangChainChains: Extending the Capabilities of LLMsA Single Link in the Chain: Prompt TemplateA Chain with Multiple PromptsMemory: Helping LLMs to Remember ConversationsConversation BufferWindowed Conversation BufferConversation SummaryAgents: Creating a System of LLMsThe Driving Power Behind Agents: Step-by-step ReasoningReAct in LangChainSummary

Overview of Semantic Search and RAGSemantic Search with Language ModelsDense RetrievalRerankingRetrieval Evaluation MetricsRetrieval-Augmented Generation (RAG)From Search to RAGExample: Grounded Generation with an LLM APIExample: RAG with Local ModelsAdvanced RAG TechniquesRAG EvaluationSummary
Transformers for VisionMultimodal Embedding ModelsCLIP: Connecting Text and ImagesHow Can CLIP Generate Multimodal Embeddings?OpenCLIPMaking Text Generation Models MultimodalBLIP-2: Bridging the Modality GapPreprocessing Multimodal InputsUse Case 1: Image CaptioningUse Case 2: Multimodal Chat-Based PromptingSummary
Embedding ModelsWhat Is Contrastive Learning?SBERTCreating an Embedding ModelGenerating Contrastive ExamplesTrain ModelIn-Depth EvaluationLoss FunctionsFine-Tuning an Embedding ModelSupervisedAugmented SBERTUnsupervised LearningTransformer-Based Sequential Denoising Auto-EncoderUsing TSDAE for Domain AdaptationSummary
Supervised ClassificationFine-Tuning a Pretrained BERT ModelFreezing LayersFew-Shot ClassificationSetFit: Efficient Fine-Tuning with Few Training ExamplesFine-Tuning for Few-Shot ClassificationContinued Pretraining with Masked Language ModelingNamed-Entity RecognitionPreparing Data for Named-Entity RecognitionFine-Tuning for Named-Entity RecognitionSummary
The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference TuningSupervised Fine-Tuning (SFT)Full Fine-TuningParameter-Efficient Fine-Tuning (PEFT)Instruction Tuning with QLoRATemplating Instruction DataModel QuantizationLoRA ConfigurationTraining ConfigurationTrainingMerge WeightsEvaluating Generative ModelsWord-Level MetricsBenchmarksLeaderboardsAutomated EvaluationHuman EvaluationPreference-Tuning / Alignment / RLHFAutomating Preference Evaluation Using Reward ModelsThe Inputs and Outputs of a Reward ModelTraining a Reward ModelTraining No Reward ModelPreference Tuning with DPOTemplating Alignment DataModel QuantizationTraining ConfigurationTrainingSummary

Content preview from Hands-On Large Language Models

Chapter 4. Text Classification

A common task in natural language processing is classification. The goal of the task is to train a model to assign a label or class to some input text (see Figure 4-1). Classifying text is used across the world for a wide range of applications, from sentiment analysis and intent detection to extracting entities and detecting language. The impact of language models, both representative and generative, on classification cannot be understated.

In this chapter, we will discuss several ways to use language models for classifying text. It will serve as an accessible introduction to using language models that already have been trained. Due to the broad field of text classification, we will discuss several techniques and use them to explore the field of language models:

“Text Classification with Representation Models” demonstrates the flexibility of nongenerative models for classification. We will cover both task-specific models and embedding models.
“Text Classification with Generative Models” is an introduction to generative language models as most of them can be used for classification. We will cover both an open source as well as a closed source language model.

In this chapter, we will focus on leveraging pretrained language models, models that already have been trained on large amounts of data ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Build a Large Language Model (From Scratch)

Publisher Resources

ISBN: 9781098150952Errata Page Supplemental Content

Hands-On Large Language Models

by Jay Alammar, Maarten Grootendorst

Chapter 4. Text Classification

Figure 4-1. Using a language model to classify text.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Designing Large Language Model Applications

Deep Learning for Coders with fastai and PyTorch

Publisher Resources

Chapter 4. Text Classification

Figure 4-1. Using a language model to classify text.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Designing Large Language Model Applications

Deep Learning for Coders with fastai and PyTorch

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.