April 2018
Intermediate to advanced
334 pages
10h 18m
English
The goal of the policy optimization method is to find the stochastic policy
that is a distribution of actions for a given state that maximizes the expected sum of rewards. It aims to find the policy directly. The basic overview is to create a neural network (that is, policy network) that processes some state information and outputs the distribution of possible actions that an agent might take.
The two major components of policy optimization are:
vector, ...