December 2018
Beginner to intermediate
684 pages
21h 9m
English
Continuous state and/or action spaces imply an infinite number of transitions that make it impossible to tabulate the state-action values, as in the previous section. Instead, we approximate the Q function by learning a continuous, parameterized mapping from training samples.
Motivated by the success of neural networks in other domains that we discussed in the previous chapters in part 4, deep neural networks have also become popular for approximating value functions. However, ML faces distinct challenges in the RL context where the data is generated by the interaction of the model with the environment using a (possibly randomized) policy: