6 Stepsize Policies

There is a wide range of adaptive learning problems that depend on an iteration of the form we first saw in chapter 5 that looks like

StartLayout 1st Row 1st Column x Superscript n plus 1 2nd Column equals 3rd Column x Superscript n Baseline plus alpha Subscript n Baseline nabla Subscript x Baseline upper F left-parenthesis x Superscript n Baseline comma upper W Superscript n plus 1 Baseline right-parenthesis period EndLayout  (6.1)

The stochastic gradient xF(xn,Wn+1) tells us what direction to go in, but we need the stepsize αn to tell us how far we should move.

There are two important settings where this formula is used. The first is where we are maximizing some metric such as contributions, utility, or performance. In these settings, the units of xF(xn,Wn+1) and the decision variable x are different, so the stepsize has to perform the scaling so that the size of αnxF(xn,Wn+1) is not too large or too small relative to xn.

A second and very important setting arises in what is known as supervised learning. In this context, we are trying to estimate some function f(x|θ) using observations y=f(x|θ)+ε. In this context, f(x|θ) and y have the same scale. We encounter these problems in three settings:

  • Approximating the function EF(x,W) to create an estimate F¯(x) that can be optimized.
  • Approximating the value Vt(St) of being in a state St and then following some policy (we encounter this problem starting in chapters 16 and 17 when we introduce approximate dynamic programming).
  • Creating a parameterized policy Xπ(S|θ) to fit observed decisions. Here, we assume we have access to some method of creating a decision x and then we use this to create a parameterized ...

Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.