7 Derivative-Free Stochastic Search

There are many settings where we wish to solve

(7.1)

which is the same problem that we introduced in the beginning of chapter 5. When we are using derivative-free stochastic search, we assume that we can choose a point $x^{n}$ according to some policy that uses a belief about the function that we can represent by $ModifyingAbove upper F With bar Superscript n Baseline left-parenthesis x right-parenthesis almost-equals double-struck upper E upper F left-parenthesis x comma upper W right-parenthesis$ (as we show below, there is more to the belief than a simple estimate of the function). Then, we observe the performance ${\hat{F}}^{n + 1} = F (x^{n}, W^{n + 1})$ . Random outcomes can be the response of a patient to a drug, the number of ad-clicks from displaying a particular ad, the strength of a material from a mixture of inputs and how the material is prepared, or the time required to complete a path over a network. After we run our experiment, we use the observed performance $ModifyingAbove upper F With caret Superscript n plus 1$ to obtain an updated belief about the function, $ModifyingAbove upper F With bar Superscript n plus 1 Baseline left-parenthesis x right-parenthesis$ .

We may use derivative-free stochastic search because we do not have access to the derivative (or gradient) $nabla upper F left-parenthesis x comma upper W right-parenthesis$ , or even a numerical approximation of the derivative. The most obvious examples arise when $x$ is a member of a discrete set $script upper X equals StartSet x 1 comma ellipsis comma x Subscript upper M Baseline EndSet$ , such as a set of drugs or materials, or perhaps different choices of websites. In addition, $x$ may be continuous, and yet we cannot even approximate a derivative. For example, we may want to test a drug dosage on a patient, but we can only do this by trying different ...

Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Reinforcement Learning and Stochastic Optimization by Warren B. Powell

7 Derivative-Free Stochastic Search

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly