January 2018
Beginner to intermediate
284 pages
8h 35m
English
The output from the last hidden state of the CNN (encoder) is given to the first time step of the decoder. We set x1 = <START> vector and the desired label y1 equals the first word in the sequence. Analogously, we set x2 equals to the word vector of the first word, and expect the network to predict the distribution of second word. Finally, on the last step, xT equals to the last word, and the target label yT =<EOS> token. During training, the correct input is given to the decoder at every time step, even if the decoder made a mistake before. Finally, the loss function is defined as the sum of negative log likelihood of the ground truth words:
Read now
Unlock full access