November 2019
Intermediate to advanced
296 pages
7h 52m
English
Q-learning is the most widely used reinforcement learning algorithm to construct the best policy by estimating the action-value function. Q-learning is an iterative optimization process that updates the initial action-value function with each observation:

Alpha is a learning rate specifying how much the function is updated in one iteration. As you can see, it does not contain the transition function. The action-value function can be estimated from only observations.
But how can we make sure to converge to the optimal value by this iterative process? Let's rewrite the equation as follows:
If the process converges, the second term ...
Read now
Unlock full access