In any complex or large-scale sequential decision making problem, there is a crucial need to use function approximation to represent relevant functions, such as the value function or the policy.

The dynamic programming (DP) and reinforcement learning (RL) methods introduced in Chapters 1 and 2 make the implicit assumption that the value function can be perfectly represented (i.e. kept in memory), for example, by using a look-up table (with a finite number of entries) assigning a value to all possible states (assumed to be in finite number) of the system. Those methods are called “exact” because they provide an exact computation of the optimal solution of the considered problem (or at least, enable the computations to converge to this optimal solution). However, such methods often apply to toy problems only, since in most interesting applications, the number of possible states is so large (and possibly infinite if we consider continuous spaces) that a perfect representation of the function at all states is impossible. It becomes necessary to approximate the function by using a moderate number of coefficients (which can be stored in a computer), and therefore extend the range of DP and RL methods to methods using such “approximate” representations. These “approximate” methods combine DP and RL methods with function approximation tools.

EXAMPLE 3.1. Let us come back to the example of the car where maintenance operations should be optimized ...

Start Free Trial

No credit card required