Attention in NLP
In this session, we will talk about the use of the attention model in RNNs for machine translation.
The problem of machine translation can be formularized as an optimization problem over P (T|S), where S is the source sentence and T is the translated sentence. The machine translation system has two major components: encoder and decoder.
Given an input, S, and each word in this sentence, we can unroll the RNN, so for each time step, the RNN takes an input word together with the previous state to update its internal state. For example, in the following figure, the input words are fed into the RNN’s encoder; after the last word, the generated state is essentially the vector representation of the entire sentence. Then, the decoder ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access