Understanding the TCA problem
The credit assignment problem is described as the task of understanding what actions you need to take to receive the most credit or, in the case of RL, rewards. RL solves the credit assignment problem by allowing an algorithm or agent to find the optimum set of actions to maximize the rewards. In all of our previous chapters, we have seen how variations of this can be done with DP and MC methods. However, both of these previous methods are offline, so they cannot learn while performing a task.
The TCA problem is differentiated from the credit assignment CA problem in that it needs to be solved across time; that is, an algorithm needs to find the best policy across time steps instead of learning after an episode, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access