April 2018
Intermediate to advanced
334 pages
10h 18m
English
While decoding, at each time step t, an intra-temporal attention function is used to attend over important parts of the encoded input sequence along with the hidden state of the decoder and previously generated words (during decoding in earlier time steps before t). This approach of attention is used to prevent attending the same parts of the input sequence during decoding at different time steps.
The attention score of the hidden input state
at the decoding time step t is given by 