January 2020
Intermediate to advanced
432 pages
10h 18m
English
The Bellman equation shows us that you can solve any MDP by first finding the optimal policy that allows an agent to traverse that MDP. Recall that a policy defines the decisions for each action that will guide an agent through an MDP. Ideally, what we want to find is the optimal policy: a policy that can maximize the value for each state and determine which states to traverse for maximum reward. When we combine this with other concepts and apply more math wizardry and then combine it with the Bellman optimality equation, we get the following optimal policy equation:

That strange term at the very beginning ( ...
Read now
Unlock full access