Approaching model-free algorithms

In the previous section, Understanding Monte Carlo methods, we said that Monte Carlo methods do not require the presence of a model of the environment to estimate the value function, or to discover excellent policies. This means that Monte Carlo is model-free: no knowledge of Markov decision process (MDP) transitions or rewards is required. So, we don't need to have modeled the environment previously, but the necessary information will be collected during an interaction with the environment (online learning). Monte Carlo methods learn directly from episodes of experience, where an episode of experience is a series of tuples (state, action, reward, and next state). 

In the following screenshot, we can see ...

Get Hands-On Reinforcement Learning with R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.