Policy search

In the policy search approach, a parameterized policy is stored, but no value function is used or estimated. Policy search methods can rely on roll-out policies that use trajectory-based sampling. Other policy search methods use optimization techniques such as evolutionary algorithms to search for optimal policy parameters. Policy gradient methods are examples of policy search. Let's look at these in detail.

Get Hands-On Reinforcement Learning with R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.