June 2018
Intermediate to advanced
546 pages
13h 30m
English
In this chapter, we discussed why it is important for PG methods to gather training data from multiple environments, due to their on-policy nature. We also implemented two different approaches to A3C, in order to parallelize and stabilize the training process. Parallelization will rise once again in this book, when we discuss black-box methods (Chapter 16, Black-Box Optimization in RL). In the upcoming chapters, we'll take a look at practical problems that could be solved using PG methods, which will wrap up the PG part of the book.
Read now
Unlock full access