Chapter 3. Transformer Anatomy

In Chapter 2, we saw what it takes to fine-tune and evaluate a transformer. Now let’s take a look at how they work under the hood. In this chapter we’ll explore the main building blocks of transformer models and how to implement them using PyTorch. We’ll also provide guidance on how to do the same in TensorFlow. We’ll first focus on building the attention mechanism, and then add the bits and pieces necessary to make a transformer encoder work. We’ll also have a brief look at the architectural differences between the encoder and decoder modules. By the end of this chapter you will be able to implement a simple transformer model yourself!

While a deep technical understanding of the Transformer architecture is generally not necessary to use nlpt_pin01 Transformers and fine-tune models for your use case, it can be helpful for comprehending and navigating the limitations of transformers and using them in new domains.

This chapter also introduces a taxonomy of transformers to help you understand the zoo of models that have emerged in recent years. Before diving into the code, let’s start with an overview of the original architecture that kick-started the transformer revolution.

The Transformer Architecture

As we saw in Chapter 1, the original Transformer is based on the encoder-decoder architecture that is widely used for tasks like machine translation, where a sequence ...

Get Natural Language Processing with Transformers, Revised Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.