January 2018
Beginner to intermediate
284 pages
8h 35m
English
If you notice the DQN equation in the previous section, the max operator in YtDQN uses the same values to both select as well as evaluate a specific action. This can be seen more clearly by re-writing the DQN function as follows:
This often results in overestimating the values, leading to more than optimal Q value estimates. To illustrate this with an example, let us consider a scenario where for a set of actions we have identical optimal Q values. But, since estimation using Q-learning is sub-optimal, we will have Q-values higher or lower than the optimal value. Due to the max operator in the equation, we select the action with ...
Read now
Unlock full access