12 Policy Function Approximations and Policy Search

A policy function approximation (PFA) is any analytical function mapping a state to an action. These “analytical functions” come in three broad (and overlapping) flavors:

  • Lookup tables – These consist of discrete inputs, and produce a discrete output. Examples are: “If the chess board is in this state, I take this move” or “If this is a male patient, over 50, never smoked, high blood sugar, then take this medication.”
  • Parametric functions – These can be linear or nonlinear models, including neural networks. The user has to specify the structure of the model which is assumed to be governed by a vector of parametersƟ, and then algorithms search for the best values of the parameters.
  • Nonparametric functions – Nonparametric functions might be locally constant approximations, locally linear defined over regions, or high-dimensional nonlinear functions such as deep neural networks.

What distinguishes policy function approximations from the other classes of policies we introduce later in the book is that each of the remaining classes has an imbedded optimization problem within the policy. As a result, PFAs are the simplest class of policies and the easiest to compute, but require a human (typically) to specify the architecture. Not surprisingly, given the wide range of decisions that we encounter throughout life, most decisions are made with simple rules that can be characterized as PFAs, so PFAs are arguably the most widely used ...

Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.