December 2018
Beginner to intermediate
684 pages
21h 9m
English
To further weaken the feedback loop from the current network parameters on the neural network weight updates, the algorithm has been extended by Deep Mind in Human-level control through deep reinforcement learning (2015: https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf) to use a slowly-changing target network.
The target network has the same architecture as the Q-network, but its weights, θ-, are only updated periodically after λ steps, when they are copied from the Q-network and held constant otherwise. The target network generates the TD target predictions, that is, it takes the place of the Q-network to estimate: