October 2019
Intermediate to advanced
340 pages
8h 39m
English
So far, we have represented the value function in the form of a lookup table in the MC and TD methods. The TD method is able to update the Q-function on the fly during an episode, which is considered an advancement on the MC method. However, the TD method is still not sufficiently scalable for problems with many states and/or actions. It will be extremely slow at learning too many values for individual pairs of states and actions using the TD method.
This chapter will focus on function approximation, which can overcome the scaling issues in the TD method. We will begin by setting up the Mountain Car environment playground. After developing the linear function estimator, we will incorporate ...
Read now
Unlock full access