October 2019
Intermediate to advanced
340 pages
8h 39m
English
In Step 2, the epsilon-greedy policy takes in a parameter, ε, with a value from 0 to 1, and |A|, the number of possible actions. Each action is taken with a probability of ε/|A|, and the action with the highest state-action value is chosen with a probability of 1-ε+ε/|A|.
In Step 3, we perform Q-learning in the following tasks:
In Step 6, again, up = 0, right = 1, down = 2, and left = 3; thus, following the optimal policy, the agent starts in state 36, ...
Read now
Unlock full access