7 Derivative-Free Stochastic Search

There are many settings where we wish to solve

StartLayout 1st Row max Underscript x element-of script upper X Endscripts double-struck upper E left-brace upper F left-parenthesis x comma upper W right-parenthesis vertical-bar upper S Superscript 0 Baseline right-brace comma EndLayout  (7.1)

which is the same problem that we introduced in the beginning of chapter 5. When we are using derivative-free stochastic search, we assume that we can choose a point xn according to some policy that uses a belief about the function that we can represent by F¯n(x)EF(x,W) (as we show below, there is more to the belief than a simple estimate of the function). Then, we observe the performance F^n+1=F(xn,Wn+1). Random outcomes can be the response of a patient to a drug, the number of ad-clicks from displaying a particular ad, the strength of a material from a mixture of inputs and how the material is prepared, or the time required to complete a path over a network. After we run our experiment, we use the observed performance F^n+1 to obtain an updated belief about the function, F¯n+1(x).

We may use derivative-free stochastic search because we do not have access to the derivative (or gradient) F(x,W), or even a numerical approximation of the derivative. The most obvious examples arise when x is a member of a discrete set X={x1,,xM}, such as a set of drugs or materials, or perhaps different choices of websites. In addition, x may be continuous, and yet we cannot even approximate a derivative. For example, we may want to test a drug dosage on a patient, but we can only do this by trying different ...

Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.