June 2018
Intermediate to advanced
546 pages
13h 30m
English
This chapter is dedicated to the extension of the Actor-Critic (A2C) method that we discussed in detail in the previous chapter. The extension adds true asynchronous environment interaction. The full name is Asynchronous Advantage Actor-Critic, which is normally abbreviated to A3C. This method is one of the most widely used by RL practitioners. We will take a look at two approaches for adding asynchronous behavior to the basic A2C method.
One of the approaches to improving the stability of the Policy Gradient (PG) family of methods is to use multiple environments in parallel. The reason behind this is the fundamental problem we discussed in Chapter 6, Deep Q-Networks ...
Read now
Unlock full access