6 Stepsize Policies
There is a wide range of adaptive learning problems that depend on an iteration of the form we first saw in chapter 5 that looks like
The stochastic gradient tells us what direction to go in, but we need the stepsize to tell us how far we should move.
There are two important settings where this formula is used. The first is where we are maximizing some metric such as contributions, utility, or performance. In these settings, the units of and the decision variable are different, so the stepsize has to perform the scaling so that the size of is not too large or too small relative to .
A second and very important setting arises in what is known as supervised learning. In this context, we are trying to estimate some function using observations . In this context, and have the same scale. We encounter these problems in three settings:
- Approximating the function to create an estimate that can be optimized.
- Approximating the value of being in a state and then following some policy (we encounter this problem starting in chapters 16 and 17 when we introduce approximate dynamic programming).
- Creating a parameterized policy to fit observed decisions. Here, we assume we have access to some method of creating a decision and then we use this to create a parameterized ...
Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.