January 2020
Intermediate to advanced
432 pages
10h 18m
English
For several chapters now, we have always assumed our agent to be greedy. That is, it always chooses the best action given a choice in policy. However, as we have seen, this does not always provide the best path to optimum reward. Instead, what we find is that by allowing an agent to randomly explore early and then over time reduce the chance of exploration, there is a substantial improvement in learning. Except, if the environment is too large or complex, an agent may need more exploration time compared to a much smaller environment. If we maintained high exploration in a small environment, our agent would just waste time exploring. This is the trade-off you need to balance and it is often tied to ...
Read now
Unlock full access