3 Coding attention mechanisms

This chapter covers

The reasons for using attention mechanisms in neural networks
A basic self-attention framework, progressing to an enhanced self-attention mechanism
A causal attention module that allows LLMs to generate one token at a time
Masking randomly selected attention weights with dropout to reduce overfitting
Stacking multiple causal attention modules into a multi-head attention module

At this point, you know how to prepare the input text for training LLMs by splitting text into individual word and subword tokens, which can be encoded into vector representations, embeddings, for the LLM.

Now, we will look at an integral part of the LLM architecture itself, attention mechanisms, as illustrated ...

Get Build a Large Language Model (From Scratch) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Build a Large Language Model (From Scratch) by Sebastian Raschka

3 Coding attention mechanisms

This chapter covers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly