April 2018
Intermediate to advanced
334 pages
10h 18m
English
The architecture of asynchronous n-step Q-learning is, to an extent, similar to that of asynchronous one-step Q-learning. The difference is that the learning agent actions are selected using the exploration policy for up to
steps or until a terminal state is reached, in order to compute a single update of policy network parameters. This process lists
rewards from the environment since its last update. Then, for each time step, the loss is calculated as the difference between the discounted future rewards at that ...
Read now
Unlock full access