January 2020
Intermediate to advanced
432 pages
10h 18m
English
The first algorithm we will look at is known as REINFORCE. It introduces the concept of PG in a very elegant manner, especially in PyTorch, which masks many of the mathematical complexities of this implementation. REINFORCE also works by solving the optimization problem in reverse. That is, instead of using gradient ascent, it reverses the mathematics so we can express the problem as a loss function and hence use gradient descent. The update equation now transforms to the following:

Here, we now assume the following:
Read now
Unlock full access