Get full access to Hands-On Reinforcement Learning with Python and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Value iteration

In value iteration, we start off with a random value function. Obviously, the random value function might not be an optimal one, so we look for a new improved value function in iterative fashion until we find the optimal value function. Once we find the optimal value function, we can easily derive an optimal policy from it:

The steps involved in the value iteration are as follows:

First, we initialize the random value function, that is, the random value for each state.
Then we compute the Q function for all state action pairs of Q(s, a).
Then we update our value function with the max value from Q(s,a).
We repeat these steps ...

Get Hands-On Reinforcement Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now