October 2018
Beginner
362 pages
9h 32m
English
One common way to compute policy gradients is with the Reinforce algorithm. Reinforce is a Monte-Carlo policy gradient method that uses likelihood ratios to estimate the value of a policy at a given point. The algorithm can lead to high variance.
Vanilla policy gradient methods can be challenging as they are extremely sensitive to what you choose for your step size parameter. Choose a step size too big and the correct policy is overwhelmed by noise – too small and the training becomes incredibly slow. Our next class of reinforcement learning algorithms, proximal policy optimization (PPO), seeks to remedy this shortcoming of policy gradients. PPO is a new class of reinforcement learning algorithms that was ...
Read now
Unlock full access