September 2018
Intermediate to advanced
296 pages
9h 10m
English
In the two previous chapters, we discussed the deep Q-learning (DQN) algorithm for playing Atari games and the Trust Region Policy Optimization (TRPO) algorithm for continuous control tasks. We saw the big success of these algorithms in solving complex problems when compared to traditional reinforcement learning algorithms without the use of deep neural networks to approximate the value function or the policy function. Their main disadvantage, especially for DQN, is that the training step converges too slowly, for example, training an agent to play Atari games takes about one week. For more complex games, even one week's training is insufficient.
This chapter will introduce a more complicated example, ...
Read now
Unlock full access