Understanding Transformers

The foundation that makes language models powerful lies in the transformer architecture. Transformer-based models addressed the challenges with RNNs and became the preferred architecture for the latest generation of LLMs. The original transformer network was presented as an encoder-decoder architecture for translation tasks. The next evolution of the architecture began with the introduction of encoder-only models like BERT in 2018, followed by the introduction of decoder-only networks in the first iteration of the GPT models.

The differences between encoder-only and decoder-only models extend beyond just network design and encompass the learning objectives. These models have contrasting learning objectives that are ...

Get Building LLMs for Production now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.