October 2019
Intermediate to advanced
340 pages
8h 39m
English
In the previous recipe, we searched for the optimal policy using MC control with greedy search where the action with the highest state-action value was selected. However, the best choice available in early episodes does not guarantee an optimal solution. If we just focus on what is temporarily the best option and ignore the overall problem, we will be stuck in local optima instead of reaching the global optima. The workaround is epsilon-greedy policy.
In MC control with epsilon-greedy policy, we no longer exploit the best action all the time, but choose an action randomly under certain probabilities. As the name implies, the algorithm has two folds:
Read now
Unlock full access