In the previous chapters, we discussed the DQN for playing Atari games and the use of the DPG and TRPO algorithms for continuous control tasks. Recall that DQN has the following architecture:
At each timestep , the agent observes the frame image and selects an action based on the current learned policy. ...