In the previous chapters, we discussed the DQN for playing Atari games and the use of the DPG and TRPO algorithms for continuous control tasks. Recall that DQN has the following architecture:
At each timestep , the agent observes the frame image
and selects an action
based on the current learned policy. ...