October 2018
Intermediate to advanced
368 pages
9h 20m
English
In DQN, the target Q-Network selects and evaluates every action resulting in an overestimation of Q value. To resolve this issue, DDQN [3] proposes to use the Q-Network to choose the action and use the target Q-Network to evaluate the action.
In DQN as summarized by Algorithm 9.6.1, the estimate of the Q value in line 10 is:

Qtarget chooses and evaluates the action a j+1.
DDQN proposes to change line 10 to:

The term
lets Q to ...
Read now
Unlock full access