Policy evaluation
Unlike the trial-and-error learning, you have already been introduced to DP methods that work as a form of static learning or what we may call planning. Planning is an appropriate definition here since the algorithm evaluates the entire MDP and hence all states and actions beforehand. Hence, these methods require full knowledge of the environment including all finite states and actions. While this works for known finite environments such as the one we are playing within this chapter, these methods are not substantial enough for real-world physical problems. We will, of course, solve real-world problems later in this book. For now, though, let's look at how to evaluate a policy from the previous update equations in code. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access