December 2018
Beginner to intermediate
684 pages
21h 9m
English
The Bellman equations define a recursive relationship between the value functions for all states, s in S, and any of their successor states, s', under a policy, π. They do so by decomposing the value function into the immediate reward and the discounted value of the next state:

This equation says that for a given policy, the value of a state must equal the expected value of its successor states under the policy, plus the expected reward that's earned from arriving at that successor state.
It implies that, if we know the values of the successor states for the currently available actions, we can look ahead one step and compute ...