December 2019
Intermediate to advanced
416 pages
12h 34m
English
In this chapter, we will look at three modifications to the DQN algorithm—target networks, Double DQN [141], and Prioritized Experience Replay [121]. Each modification addresses a separate issue with DQN, so they can be combined to yield significant performance improvements.
In Section 5.1 we discuss target networks which are lagged copies of . The target network is then used to generate the maximum Q-value in the next state s′ when calculating , in contrast to the DQN algorithm from Chapter 4 which used when calculating . This helps to stabilize training by reducing the speed at which changes.
Next, we discuss the Double DQN algorithm in Section 5.2. Double DQN uses two Q-networks to calculate ...
Read now
Unlock full access