January 2025
Beginner to intermediate
432 pages
13h 16m
English
This chapter covers
In chapter 11, we developed the GPT-2XL model from scratch but were unable to train it due to its vast number of parameters. Training a model with 1.5 billion parameters requires supercomputing facilities and an enormous amount of data. Consequently, we loaded pretrained weights from OpenAI into our model and then used the GPT-2XL model to generate text.
However, learning how to train a Transformer model from scratch is crucial for several reasons. First, while this ...
Read now
Unlock full access