April 2018
Intermediate to advanced
334 pages
10h 18m
English
Monte Carlo is the simplest approach for model free learning, where the agent observes rewards for all steps ahead in an episode, that is, a full look ahead. Thus, total estimate reward at time t would be
given by:

Here,
is the discount factor and T being the time step when the episode ends. We can initialize the Monte Carlo learning technique using the following code:
Initialize: i.e. the policy to be evaluated V i.e. ...