4 Monte Carlo Methods

In the previous chapter, we learned how to compute the optimal policy using two interesting dynamic programming methods called value and policy iteration. Dynamic programming is a model-based method and it requires the model dynamics of the environment to compute the value and Q functions in order to find the optimal policy.

But let's suppose we don't have the model dynamics of the environment. In that case, how do we compute the value and Q functions? Here is where we use model-free methods. Model-free methods do not require the model dynamics of the environment to compute the value and Q functions in order to find the optimal policy. One such popular model-free method is the Monte Carlo (MC) method.

We will begin the ...

Get Deep Reinforcement Learning with Python - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning with Python - Second Edition by Sudharsan Ravichandiran

4

Monte Carlo Methods

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly