December 2018
Beginner to intermediate
684 pages
21h 9m
English
Key elements of the DDQN's computational graph include placeholder variables for the state, action, and reward sequences:
# input to Q networkstate = tf.placeholder(dtype=tf.float32, shape=[None, state_dim])# input to target network next_state = tf.placeholder(dtype=tf.float32, shape=[None, state_dim])# action indices (indices of Q network output) action = tf.placeholder(dtype=tf.int32, shape=[None])# rewards for target computationreward = tf.placeholder(dtype=tf.float32, shape=[None])
The create_network function generates the three dense layers that can be trained and/or reused as required by the Q network and its slower-moving target network:
def create_network(s, layers, trainable, reuse, n_actions=4): """Generate ...