Double DQN or the fixed Q targets

In order to understand why we may use two networks in combination, or dueling, we first need to understand why we would need to do that. Let's go back to how we calculated the TD loss and used that as our way to estimate actions. As you may recall, we calculated loss based on estimations of the target. However, in the case of our DQN model, that target is now continually changing. The analogy we can use here is that our agent may chase its own tail at times, trying to find a target. Those of you who have been very observant may have viewed this during previous training by seeing an oscillating reward. What we can do here is create another target network that we will aim for and update as we go along. This ...

Get Hands-On Reinforcement Learning for Games now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.