Separate target network to compute the target Q-values

A separate network to generate target Q-values is an important feature and makes deep Q-networks unique. The Q-values generated by this separate target network is used to compute loss after every action taken by the agent during training. The reason behind the use of two networks instead of one is that the primary Q-network values shift constantly at every step owing to the change in weights at every step, and this makes the Q-values generated from this network unstable.

In order to get stable Q-values, another neural network is used whose weights are changed slowly compared to the primary Q-network. In this way, the training process is more stable. This was also published in a post ( ...

Get Reinforcement Learning with TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.