October 2019
Intermediate to advanced
366 pages
12h 4m
English
By definition, the value function of a policy is the expected return (that is, the sum of discounted rewards) of that policy starting from a given state:

Following the reasoning of Chapter 3, Solving Problems with Dynamic Programming, DP algorithms update state values by computing expectations for all the next states of their values:

Unfortunately, computing the value function means that you need to know the state transition probabilities. In fact, DP algorithms use the model of the environment to obtain those probabilities. ...
Read now
Unlock full access