January 2018
Beginner to intermediate
284 pages
8h 35m
English
In contrast to value learning-based algorithms, policy search-based methods directly search for an optimal policy π* under the policy space. This is typically done by parameterization of policy πθ where parameter θ is updated to maximize the expected value of the reward Ε (r|θ). The introduction of parameter θ serves as adding prior information to the policy search so as to restrict the search space by using this information.
Such techniques can often be used when the task is well known and integration of prior knowledge can serve well for the learning problem. Policy-based algorithms can be further sub-divided into two parts:
Read now
Unlock full access