April 2018
Intermediate to advanced
334 pages
10h 18m
English
Since the maximum likelihood objective computes the probability of the next token based on the previously generated token and a reward metric such as ROUGE helps in the measurement of human readability through perplexity, both are used to derive a mixed learning objective function as follows:

Here,
is the scaling factor to balance the difference in magnitude of
and .