Using A3C

Synchronous actor-critic workers provide a training advantage by essentially providing more sampling variations that should, in turn, reduce the amount of expected error and thus improve training performance. Mathematically, all we are doing is providing a larger sampling space which, as any statistician will tell you, just reduces the sampling error. However, if we assume that each worker is asynchronous, meaning it updates the global model in its own time, this also provides us with more statistical variability in our sampling across the entire trajectory space. This can also happen along the sampling space at the same time. In essence, we could have workers sampling the trajectory at many different points, as shown in the following ...

Get Hands-On Reinforcement Learning for Games now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.