Skip to Content
Build a Large Language Model (From Scratch)
book

Build a Large Language Model (From Scratch)

by Sebastian Raschka
September 2024
Beginner to intermediate
368 pages
9h 49m
English
Manning Publications
Content preview from Build a Large Language Model (From Scratch)

appendix D Adding bells and whistles to the training loop

In this appendix, we enhance the training function for the pretraining and fine-tuning processes covered in chapters 5 to 7. In particular, it covers learning rate warmup, cosine decay, and gradient clipping. We then incorporate these techniques into the training function and pretrain an LLM.

To make the code self-contained, we reinitialize the model we trained in chapter 5:

import torch
from chapter04 import GPTModel

GPT_CONFIG_124M = {
    "vocab_size": 50257,          #1

    "context_length": 256,       #2
    "emb_dim": 768,           #3
    "n_heads": 12,            #4
    "n_layers": 12,           #5
    "drop_rate": 0.1,         #6
    "qkv_bias": False         #7 } device = torch.device("cuda" if torch.cuda.is_available() else "cpu") torch.manual_seed(123) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Sebastian Raschka
Hands-On Large Language Models

Hands-On Large Language Models

Jay Alammar, Maarten Grootendorst

Publisher Resources

ISBN: 9781633437166Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentErrata PagePurchase Link