Luong attention (see Effective Approaches to Attention-based Neural Machine Translation at https://arxiv.org/abs/1508.04025) introduces several improvements over Bahdanau attention. Most notably, the alignment scores et depend on the decoder hidden state st, as opposed to st-1 in Bahdanau attention. To better understand this, let's compare the two algorithms:
Let's go through a step-by-step execution of Luong attention:
- Feed the encoder with the input sequence and compute the set of encoder hidden states .
- Compute the decoder hidden state based on the previous decoder hidden ...