Synchronous actor-critic workers provide a training advantage by essentially providing more sampling variations that should, in turn, reduce the amount of expected error and thus improve training performance. Mathematically, all we are doing is providing a larger sampling space which, as any statistician will tell you, just reduces the sampling error. However, if we assume that each worker is asynchronous, meaning it updates the global model in its own time, this also provides us with more statistical variability in our sampling across the entire trajectory space. This can also happen along the sampling space at the same time. In essence, we could have workers sampling the trajectory at many different points, as shown in the following ...
Get Hands-On Reinforcement Learning for Games now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.