January 2019
Intermediate to advanced
342 pages
9h 17m
English
The model is trained on categorical cross-entropy loss to predict the target words in each time step of the decoder LSTM. The categorical cross-entropy loss in any step would be over all the words of the vocabulary, and can be represented as follows:

The label
represents the one hot-encoded version of the target word. Only the label corresponding to the actual word would be one; the rest would be zero. The term Pi represents the probability that the actual target word is the word indexed by i. To get the ...
Read now
Unlock full access