Attention mechanisms
Neural machine translation (NMT) models suffer from the same long-term dependency issues that RNNs in general suffer from. While we saw that LSTMs can mitigate much of this behavior, it still becomes problematic with long sentences. Especially in machine translation, where the translation of the sentence is largely dependent on how much information is contained within the hidden state of the encoder network, we must ensure that those end states are as rich as possible. We solve this with something called attention mechanisms.
Attention mechanisms allow the decoder to select parts of the input sentence based on context and what has generated thus far. We utilize a vector called a context vector to store scores from the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access