April 2018
Intermediate to advanced
334 pages
10h 18m
English
While deciding the output sequence token, the decoder decides either to use a softmax layer for token generation or to use a pointer mechanism to point at the rare important token in the input and copy that as an output sequence token. At each decoding step, a switch function is used to decide whether to use token generation or to use point to copy an input token.
is defined as a binary value, which is equal to 1 if the pointer mechanism is used, otherwise 0. Therefore, the probability of
as an output token is given ...