Intra-decoder attention
Even an intra-temporal attention function ensures that, during each decoding step, different parts of the encoded input are attended but the decoder can still generate repeated phrases during long sequences. In order to prevent that, information from the previously decoded sequence can also be fed into the decoder. Information from the previous decoding steps will help the model to avoid repetition of the same information and lead to structured prediction.
In order to accomplish this approach to incorporate the information from previous decoding steps, an intra-decoder attention is applied. This approach is not used in current encoder-decoder models for abstractive summarization. For each time step t while decoding, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access