Model -Free Policy
In Part II, we introduce a reinfor cement learning appr oach based on value
functions called policy iteration.
The key issue in the policy iteration framework is how to accurately ap-
proximate the value function from a small number of data samples. In Chap-
ter 2, a fundamental framework of value function approximation based on
least squares is explained. In this le ast-squares formulation, how to design
good basis functions is critical for better value function approximation. A
practical basis design method based on manifold-based s moothing (Chapelle
et al., 2006) is explained in Chapter 3.
In real-world reinforcement learning tasks, gathering data is often costly.
In Cha pter 4, we describe a method for eﬃciently reusing previously cor-
rected samples in the framework of covariate shift adaptation (Sugiyama &
Kawanabe, 2012). In Chapter 5, we apply a statistical active learning tech-
nique (Sugiyama & K awanabe, 2012) to o ptimizing data collection str ategies
for reducing the sampling cost.
Finally, in Chapter 6, an outlier-robust extension of the least-squares
metho d based on robust regression (Huber, 1981) is introduced. Such a ro-
bust method is highly useful in handling noisy real-world data.