Paying Attention

To understand transformers, you first need to understand attention. The key feature of transformers is in how they apply attention. But what is attention, and where does attention come from?

First, consider you’re trying to train a model to translate text between English and German. You have a large corpus of text pairs with a source sentence in English and a target sentence in German. This is a classic example of machine translation, and more specifically in the context of deep learning, neural machine translation.

In neural machine translation, the goal is to map a source sequence to a target sequence. From what you know so far, this sounds like a perfect application of recurrent neural networks. Remember, recurrent neural ...

Get Machine Learning in Elixir now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.