4 Implementing a GPT model from scratch to generate text

This chapter covers

Coding a GPT-like large language model (LLM) that can be trained to generate human-like text
Normalizing layer activations to stabilize neural network training
Adding shortcut connections in deep neural networks
Implementing transformer blocks to create GPT models of various sizes
Computing the number of parameters and storage requirements of GPT models

You’ve already learned and coded the multi-head attention mechanism, one of the core components of LLMs. Now, we will code the other building blocks of an LLM and assemble them into a GPT-like model that we will train in the next chapter to generate human-like text.

Figure 4.1 The three main stages of coding ...

Get Build a Large Language Model (From Scratch) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Build a Large Language Model (From Scratch) by Sebastian Raschka

4 Implementing a GPT model from scratch to generate text

This chapter covers

Figure 4.1 The three main stages of coding ...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly