October 2019
Intermediate to advanced
366 pages
12h 4m
English
Dynamic programming (DP) algorithms are effective for solving reinforcement learning (RL) problems, but they require two strong assumptions. The first is that the model of the environment has to be known, and the second is that the state space has to be small enough so that it does not suffer from the curse of dimensionality problem.
In this chapter, we'll develop a class of algorithms that get rid of the first assumption. In addition, it is a class of algorithms that aren't affected by the problem of the curse of dimensionality of DP algorithms. These algorithms learn directly from the environment and from the experience, estimating the value function based on many returns, and do not compute the expectation ...
Read now
Unlock full access