Asynchronous n-step Q-learning

The architecture of asynchronous n-step Q-learning is, to an extent, similar to that of asynchronous one-step Q-learning. The difference is that the learning agent actions are selected using the exploration policy for up to  steps or until a terminal state is reached, in order to compute a single update of policy network parameters. This process lists  rewards from the environment since its last update. Then, for each time step, the loss is calculated as the difference between the discounted future rewards at that ...

Get Reinforcement Learning with TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.