April 2018
Intermediate to advanced
334 pages
10h 18m
English
The teacher forcing algorithm (by Williams et. al., 1989) is the most widely used method to train a decoder RNN for sequence generation. At each time step during decoding, the teacher forcing algorithm minimizes the maximum-likelihood loss.
is defined as the ground truth output sequence for a given input sequence x. Then, the maximum likelihood objective of supervised learning using the teacher forcing algorithm would be to minimize the loss function, given by the following:

But such an objective of ...