Part II
Model -Free Policy
Iteration
In Part II, we introduce a reinfor cement learning appr oach based on value
functions called policy iteration.
The key issue in the policy iteration framework is how to accurately ap-
proximate the value function from a small number of data samples. In Chap-
ter 2, a fundamental framework of value function approximation based on
least squares is explained. In this le ast-squares formulation, how to design
good basis functions is critical for better value function approximation. A
practical basis design method based on manifold-based s moothing (Chapelle
et al., 2006) is explained in Chapter 3.
In real-world reinforcement learning tasks, gathering data is often costly.
In Cha pter 4, we describe a method for efficiently reusing previously cor-
rected samples in the framework of covariate shift adaptation (Sugiyama &
Kawanabe, 2012). In Chapter 5, we apply a statistical active learning tech-
nique (Sugiyama & K awanabe, 2012) to o ptimizing data collection str ategies
for reducing the sampling cost.
Finally, in Chapter 6, an outlier-robust extension of the least-squares
metho d based on robust regression (Huber, 1981) is introduced. Such a ro-
bust method is highly useful in handling noisy real-world data.
This page intentionally left blankThis page intentionally left blank
Get Statistical Reinforcement Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.