4 Implementing a GPT model from scratch to generate text

This chapter covers

  • Coding a GPT-like large language model (LLM) that can be trained to generate human-like text
  • Normalizing layer activations to stabilize neural network training
  • Adding shortcut connections in deep neural networks
  • Implementing transformer blocks to create GPT models of various sizes
  • Computing the number of parameters and storage requirements of GPT models

You’ve already learned and coded the multi-head attention mechanism, one of the core components of LLMs. Now, we will code the other building blocks of an LLM and assemble them into a GPT-like model that we will train in the next chapter to generate human-like text.

Figure 4.1 The three main stages of coding ...

Get Build a Large Language Model (From Scratch) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.