April 2018
Intermediate to advanced
334 pages
10h 18m
English
Q-learning is an attempt to learn the value Q(s,a) of a specific action given to the agent in a particular state. Consider a table where the number of rows represent the number of states, and the number of columns represent the number of actions. This is called a Q-table. Thus, we have to learn the value to find which action is the best for the agent in a given state.
Steps involved in Q-learning:
Initialize the table of Q(s,a) with uniform values (say, all zeros).
Observe the current state, s
Choose an action, a, by epsilon greedy or any other action selection policies, and take the action
As a result, a reward, r, is received and a new state, s', is perceived
Update the Q value ...