September 2018
Intermediate to advanced
288 pages
7h 38m
English
TD learning algorithms are based on reducing the differences between estimates made by the agent at different times. Q-learning, which we will discuss in the following section, is a TD algorithm, but it is based on the difference between states in immediately adjacent instants. TD is more generic and may consider moments and states further away.
TD is a combination of the ideas of the MC method and DP, both of which can be summarized as follows:
Read now
Unlock full access