June 2018
Intermediate to advanced
546 pages
13h 30m
English
In Chapter 9, Policy Gradients – An Alternative, we started to investigate an alternative to the familiar value-based methods family, called policy-based. In particular, we focused on the method called REINFORCE and its modification that uses a discounted reward to obtain the gradient of the policy (which gives us the direction to improve the policy). Both methods worked well for a small CartPole problem, but for a more complicated Pong environment, the convergence dynamic was painfully slow.
In this chapter, we'll discuss one more extension to the vanilla Policy Gradient (PG) method, which magically improves the stability and convergence speed of the new method. Despite the modification being only minor, the ...
Read now
Unlock full access