October 2019
Intermediate to advanced
366 pages
12h 4m
English
For off-policy algorithms (such as DQN), n-step learning works well with small values of n. In DQN, it has been shown that the algorithm works well with values of n between 2 and 4, leading to improvements in a wide range of Atari games.
In the following graph, the results of our implementation are visible. We tested DQN with a three-step return. From the results, we can see that it requires more time before taking off. Afterward, it has a steeper learning curve but with an overall similar learning curve compared to DQN:

Read now
Unlock full access