3 Coding attention mechanisms

This chapter covers

  • The reasons for using attention mechanisms in neural networks
  • A basic self-attention framework, progressing to an enhanced self-attention mechanism
  • A causal attention module that allows LLMs to generate one token at a time
  • Masking randomly selected attention weights with dropout to reduce overfitting
  • Stacking multiple causal attention modules into a multi-head attention module

At this point, you know how to prepare the input text for training LLMs by splitting text into individual word and subword tokens, which can be encoded into vector representations, embeddings, for the LLM.

Now, we will look at an integral part of the LLM architecture itself, attention mechanisms, as illustrated ...

Get Build a Large Language Model (From Scratch) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.