Skip to Content
Hands-On Machine Learning with Scikit-Learn and PyTorch
book

Hands-On Machine Learning with Scikit-Learn and PyTorch

by Aurélien Géron
October 2025
Intermediate to advanced
878 pages
26h 47m
English
O'Reilly Media, Inc.
Book available
Content preview from Hands-On Machine Learning with Scikit-Learn and PyTorch

Chapter 15. Transformers for Natural Language Processing and Chatbots

In a landmark 2017 paper titled “Attention Is All You Need”,⁠1 a team of Google researchers proposed a novel neural net architecture named the Transformer, which significantly improved the state of the art in neural machine translation (NMT). In short, the Transformer architecture is simply an encoder-decoder model, very much like the one we built in Chapter 14 for English-to-Spanish translation, and it can be used in exactly the same way (see Figure 15-1):

  1. The source text goes in the encoder, which outputs contextualized embeddings (one per token).

  2. The encoder’s output is then fed to the decoder, along with the translated text so far (starting with a start-of-sequence token).

  3. The decoder predicts the next token for each input token.

  4. The last token output by the decoder is appended to the translation.

  5. Steps 2 to 4 are repeated again and again to produce the full translation, one extra token at a time, until an end-of-sequence token is generated. During training, we already have the full translation—it’s the target—so it is fed to the decoder in step 2 (starting with a start-of-sequence token), and steps 4 and 5 are not needed.

Diagram illustrating the Transformer model's process for translating English to Spanish, showing how the encoder generates contextual embeddings and the decoder predicts the next token in the translated sequence.
Figure 15-1. Using the Transformer model for English-to-Spanish translation

So what’s new? Well, inside the black box, there are some important differences with our previous ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Machine Learning with PyTorch and Scikit-Learn

Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka, Yuxi (Hayden) Liu, Vahid Mirjalili

Publisher Resources

ISBN: 9798341607972Errata PageSupplemental Content