Q-learning obtained by neural networks was deemed unstable until some tricks made it possible and feasible. There are two power-horses in deep Q-learning, though other variants of the algorithm have been developed recently in order to solve problems with performance and convergence in the original solution. Such new variants are not discussed in our project: double Q-learning, delayed Q-learning, greedy GQ, and speedy Q-learning. The two main DQN power-horses that we are going to explore are experience replay and the decreasing trade-off between exploration and exploitation.
With experience replay, we simply store away the observed states of the game in a queue of a prefixed size since we discard older ...