October 2019
Intermediate to advanced
366 pages
12h 4m
English
The main idea behind PPO is to clip the surrogate objective function when it moves away, instead of constraining it as it does in TRPO. This prevents the policy from making updates that are too large. The main objective is as follows:
(7.9)
Here,
is defined as follows:
(7.10)
What the objective is saying is that if the probability ratio, , between the new and the old policy is higher or lower than a constant, , then the minimum ...
Read now
Unlock full access