October 2019
Intermediate to advanced
366 pages
12h 4m
English
The algorithms that have been learned and developed so far are value-based, which, at their core, learn a value function, V(s), or action-value function, Q(s, a). A value function is a function that defines the total reward that can be accumulated from a given state or state-action pair. An action can then be selected, based on the estimated action (or state) values.
Therefore, a greedy policy can be defined as follows:

Value-based methods, when combined with deep neural networks, can learn very sophisticated policies in order to control agents that operate in high-dimensionality spaces. Despite these great qualities, ...
Read now
Unlock full access