April 2017
Intermediate to advanced
318 pages
7h 40m
English
Deep reinforcement learning utilizes a model-free reinforcement learning technique called Q-learning. Q-learning can be used to find an optimal action for any given state in a finite markov decision process. Q-learning tries to maximize the value of the Q-function which represents the maximum discounted future reward when we perform action a in state s:
Once we know the Q-function, the optimal action a at a state s is the one with the highest Q-value. We can then define a policy π(s) that gives us the optimal action at any state:
We can define the Q-function for a transition point (st, at, rt, st+1) in terms of the Q-function ...