book

Build a Large Language Model (From Scratch)

Name: Build a Large Language Model (From Scratch)
Author: Sebastian Raschka
ISBN: 9781633437166

by Sebastian Raschka

September 2024

Beginner to intermediate

368 pages

9h 38m

English

Manning Publications

Read now

Unlock full access

Build a Large Language Model (From Scratch)
copyright
contents
preface
acknowledgments
about this book
about the author
about the cover illustration
1 Understanding large language models
1.1 What is an LLM?1.2 Applications of LLMs1.3 Stages of building and using LLMs1.4 Introducing the transformer architecture1.5 Utilizing large datasets1.6 A closer look at the GPT architecture1.7 Building a large language model
2 Working with text data
2.1 Understanding word embeddings2.2 Tokenizing text2.3 Converting tokens into token IDs2.4 Adding special context tokens2.5 Byte pair encoding2.6 Data sampling with a sliding window2.7 Creating token embeddings2.8 Encoding word positions

3 Coding attention mechanisms
3.1 The problem with modeling long sequences3.2 Capturing data dependencies with attention mechanisms3.3 Attending to different parts of the input with self-attention3.3.1 A simple self-attention mechanism without trainable weights3.3.2 Computing attention weights for all input tokens3.4 Implementing self-attention with trainable weights3.4.1 Computing the attention weights step by step3.4.2 Implementing a compact self-attention Python class3.5 Hiding future words with causal attention3.5.1 Applying a causal attention mask3.5.2 Masking additional attention weights with dropout3.5.3 Implementing a compact causal attention class3.6 Extending single-head attention to multi-head attention3.6.1 Stacking multiple single-head attention layers3.6.2 Implementing multi-head attention with weight splits
4 Implementing a GPT model from scratch to generate text
4.1 Coding an LLM architecture4.2 Normalizing activations with layer normalization4.3 Implementing a feed forward network with GELU activations4.4 Adding shortcut connections4.5 Connecting attention and linear layers in a transformer block4.6 Coding the GPT model4.7 Generating text
5 Pretraining on unlabeled data
5.1 Evaluating generative text models5.1.1 Using GPT to generate text5.1.2 Calculating the text generation loss5.1.3 Calculating the training and validation set losses5.2 Training an LLM5.3 Decoding strategies to control randomness5.3.1 Temperature scaling5.3.2 Top-k sampling5.3.3 Modifying the text generation function5.4 Loading and saving model weights in PyTorch5.5 Loading pretrained weights from OpenAI
6 Fine-tuning for classification
6.1 Different categories of fine-tuning6.2 Preparing the dataset6.3 Creating data loaders6.4 Initializing a model with pretrained weights6.5 Adding a classification head6.6 Calculating the classification loss and accuracy6.7 Fine-tuning the model on supervised data6.8 Using the LLM as a spam classifier
7 Fine-tuning to follow instructions
7.1 Introduction to instruction fine-tuning7.2 Preparing a dataset for supervised instruction fine-tuning7.3 Organizing data into training batches7.4 Creating data loaders for an instruction dataset7.5 Loading a pretrained LLM7.6 Fine-tuning the LLM on instruction data7.7 Extracting and saving responses7.8 Evaluating the fine-tuned LLM7.9 Conclusions7.9.1 What’s next?7.9.2 Staying up to date in a fast-moving field7.9.3 Final words
appendix A Introduction to PyTorch
A.1 What is PyTorch?A.1.1 The three core components of PyTorchA.1.2 Defining deep learningA.1.3 Installing PyTorchA.2 Understanding tensorsA.2.1 Scalars, vectors, matrices, and tensorsA.2.2 Tensor data typesA.2.3 Common PyTorch tensor operationsA.3 Seeing models as computation graphsA.4 Automatic differentiation made easyA.5 Implementing multilayer neural networksA.6 Setting up efficient data loadersA.7 A typical training loopA.8 Saving and loading modelsA.9 Optimizing training performance with GPUsA.9.1 PyTorch computations on GPU devicesA.9.2 Single-GPU trainingA.9.3 Training with multiple GPUs
appendix B References and further reading
appendix C Exercise solutions
appendix D Adding bells and whistles to the training loop
D.1 Learning rate warmupD.2 Cosine decayD.3 Gradient clippingD.4 The modified training function
appendix E Parameter-efficient fine-tuning with LoRA
E.1 Introduction to LoRAE.2 Preparing the datasetE.3 Initializing the modelE.4 Parameter-efficient fine-tuning with LoRA

Overview

How to implement LLM attention mechanisms and GPT-style transformers.

Bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and creation, to pretraining on a general corpus, and on to fine-tuning for specific tasks.

Build a Large Language Model (from Scratch) teaches you how to:

Plan and code all the parts of an LLM
Prepare a dataset suitable for LLM training
Fine-tune LLMs for text classification and with your own data
Use human feedback to ensure your LLM follows instructions
Load pretrained weights into an LLM

Build a Large Language Model (from Scratch) takes you inside the AI black box to tinker with the internal systems that power generative AI. As you work through each key stage of LLM creation, you’ll develop an in-depth understanding of how LLMs work, their limitations, and their customization methods. Your LLM can be developed on an ordinary laptop, and used as your own personal assistant.

About the Technology
Physicist Richard P. Feynman reportedly said, “I don’t understand anything I can’t build.” Based on this same powerful principle, bestselling author Sebastian Raschka guides you step by step as you build a GPT-style LLM that you can run on your laptop. This is an engaging book that covers each stage of the process, from planning and coding to training and fine-tuning.

About the Book
Build a Large Language Model (From Scratch) is a practical and eminently-satisfying hands-on journey into the foundations of generative AI. Without relying on any existing LLM libraries, you’ll code a base model, evolve it into a text classifier, and ultimately create a chatbot that can follow your conversational instructions. And you’ll really understand it because you built it yourself!

For deeper understanding and better learning we provide a built-in testing system into liveBook, the online version of this book. Separately, you can download a free PDF Test Yourself guide on this book from here.

What's Inside

Plan and code an LLM comparable to GPT-2
Load pretrained weights
Construct a complete training pipeline
Fine-tune your LLM for text classification
Develop LLMs that follow human instructions

About the Reader
Readers need intermediate Python skills and some knowledge of machine learning. The LLM you create will run on any modern laptop and can optionally utilize GPUs.

About the Authors
Sebastian Raschka, PhD, is an LLM Research Engineer with over a decade of experience in artificial intelligence. His work spans industry and academia, including implementing LLM solutions as a senior engineer at Lightning AI and teaching as a statistics professor at the University of Wisconsin–Madison.

Sebastian collaborates with Fortune 500 companies on AI solutions and serves on the Open Source Board at University of Wisconsin–Madison. He specializes in LLMs and the development of high-performance AI systems, with a deep focus on practical, code-driven implementations. He is the author of the bestselling books Machine Learning with PyTorch and Scikit-Learn, and Machine Learning Q and AI.

The technical editor on this book was David Caswell.
FREE! Exercises to Enhance your LLM LearningTest Yourself On Build a Large Language Model (From Scratch)

Quotes
Truly inspirational! It motivates you to put your new skills into action.
- Benjamin Muskalla, Senior Engineer, GitHub

The most understandable and comprehensive explanation of language models yet! Its unique and practical teaching style achieves a level of understanding you can’t get any other way.
- Cameron Wolfe, Senior Scientist, Netflix

Sebastian combines deep knowledge with practical engineering skills and a knack for making complex ideas simple. This is the guide you need!
- Chip Huyen, author of Designing Machine Learning Systems and AI Engineering

Definitive, up-to-date coverage. Highly recommended!
- Dr. Vahid Mirjalili, Senior Data Scientist, FM Global

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Build a Large Language Model (From Scratch)

Publisher Resources

ISBN: 9781633437166Publisher Support Other Publisher Website Errata Page Purchase Link

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills