5 Pretraining on unlabeled data

This chapter covers

Computing the training and validation set losses to assess the quality of LLM-generated text during training
Implementing a training function and pretraining the LLM
Saving and loading model weights to continue training an LLM
Loading pretrained weights from OpenAI

Thus far, we have implemented the data sampling and attention mechanism and coded the LLM architecture. It is now time to implement a training function and pretrain the LLM. We will learn about basic model evaluation techniques to measure the quality of the generated text, which is a requirement for optimizing the LLM during the training process. Moreover, we will discuss how to load pretrained weights, giving our LLM a solid ...

Get Build a Large Language Model (From Scratch) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Build a Large Language Model (From Scratch) by Sebastian Raschka

5 Pretraining on unlabeled data

This chapter covers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly