Value iteration

In value iteration, we start off with a random value function. Obviously, the random value function might not be an optimal one, so we look for a new improved value function in iterative fashion until we find the optimal value function. Once we find the optimal value function, we can easily derive an optimal policy from it:

The steps involved in the value iteration are as follows:

  1. First, we initialize the random value function, that is, the random value for each state.
  2. Then we compute the Q function for all state action pairs of Q(s, a).
  3. Then we update our value function with the max value from Q(s,a).
  4. We repeat these steps ...

Get Hands-On Reinforcement Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.