Exploring policy gradient methods

The policy gradient is a class of reinforcement learning algorithms based on the use of parameterized policies. The idea is to calculate the expected return gradient (reward) with respect to each parameter in order to change the parameters in a direction that increases the performance of the same. This method doesn't show the problems of traditional reinforcement learning such as the lack of guarantees of a value function, the problem resulting from the uncertainty of the state, and the complexities that arise from states and actions in continuous spaces. In the policy search method, no value function is used or estimated. The value function can be used to learn the policy parameter; however, it won't necessary ...

Get Hands-On Reinforcement Learning with R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.