January 2020
Intermediate to advanced
432 pages
10h 18m
English
There are two ways to implement what is called Monte Carlo control on an agent. The difference between the two is how they calculate the average return or sampled mean. In what is called First-Visit Monte Carlo, the agent only samples the mean the first time a state is visited. The other method, Every-Visit Monte Carlo, samples the average return every time a state is visited. The latter method is what we will explore in the code example for this chapter.
Read now
Unlock full access