September 2018
Intermediate to advanced
296 pages
9h 10m
English
As discussed in the previous chapter, DQN uses the Q-network to estimate the state-action value function, which has a separate output for each available action. Therefore, the Q-network cannot be applied, due to the continuous action space. A careful reader may remember that there is another architecture of the Q-network that takes both the state and the action as its inputs, and outputs the estimate of the corresponding Q-value. This architecture doesn't require the number of available actions to be finite, and has the capability to deal with continuous input actions:

If we use this kind of network to estimate ...
Read now
Unlock full access