O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Encoder-decoder with attention

The encoder-decoder architecture described in the previous section has one major shortcoming. As the final encoder state is of a fixed length, it can lead to loss of information. While this may not be a problem for short phrases, for longer source language inputs, the encoder may not be able to capture long-term dependencies. This leads to the decoder not being able to output good translations. To overcome this problem, the attention mechanism was introduced by Bahdanau et al. in their paper Neural Machine Translation by Jointly Learning to Align and Translate. The following diagram is an illustration of the architecture as taken from their paper:

The main idea behind attention is to focus or pay attention ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required