December 2019
Intermediate to advanced
468 pages
14h 28m
English
We spent the better part of this chapter touting the advantages of the attention mechanism. But we still use attention in the context of RNNs—in that sense, it works as an addition on top of the core recurrent nature of these models. Since attention is so good, is there a way to use it on its own without the RNN part? It turns out that there is. The paper Attention is all you need (https://arxiv.org/abs/1706.03762) introduces a new architecture called transformer with encoder and decoder that relies solely on the attention mechanism. First, we'll focus our attention on the transformer attention (pun intended) mechanism.
Read now
Unlock full access