December 2018
Beginner to intermediate
684 pages
21h 9m
English
In RL, reward signals can occur significantly later than actions that contributed to the result, complicating the association of actions with their consequences. The credit assignment problem consists of accurately estimating the benefits and costs of actions in a given state due to these delays. RL algorithms need to find a way to distribute the credit for positive and negative outcomes among the many decisions that may have been involved in producing it.