October 2018
Beginner
362 pages
9h 32m
English
In Chapter 8, Reinforcement Learning, we learned about how to use policy optimization methods for continuous action spaces. Policy optimization methods learn directly by optimizing a policy from actions taken in their environment, as explained in the following diagram:

Remember, policy gradient methods are off-policy, meaning that their behavior in a certain moment is not necessarily reflective of the policy they are abiding by. These policy gradient algorithms utilize policy iteration, where they evaluate the given policy and follow the policy gradient in order to learn an optimal policy. ...
Read now
Unlock full access