October 2019
Intermediate to advanced
366 pages
12h 4m
English
The practical algorithm that is introduced in the PPO paper uses a truncated version of Generalized Advantage Estimation (GAE), an idea that was introduced for the first time in the paper High-Dimensional Continuous Control using Generalized Advantage Estimation. GAE calculates the advantage as follows:
(7.11)
It does this instead of using the common advantage estimator:
(7.12)
Continuing with the PPO algorithm, on each iteration, N trajectories from multiple parallel actors are collected with time horizon T, and the policy ...
Read now
Unlock full access