January 2018
Beginner to intermediate
284 pages
8h 35m
English
Value learning-based RL algorithms focus on a key aspect of defining a value for every state-action pair. More formally, it is defined as the value of expected reward obtained while being in a state S and under a policy π as follows:
Once this value function is defined, the task of choosing an optimal action (or an optimal policy) simply reduces to learning the optimal value function as follows:
One of the challenges in estimating the optimal value of this function is lack of a complete state-action transition ...
Read now
Unlock full access