7 Derivative-Free Stochastic Search
There are many settings where we wish to solve
which is the same problem that we introduced in the beginning of chapter 5. When we are using derivative-free stochastic search, we assume that we can choose a point according to some policy that uses a belief about the function that we can represent by (as we show below, there is more to the belief than a simple estimate of the function). Then, we observe the performance . Random outcomes can be the response of a patient to a drug, the number of ad-clicks from displaying a particular ad, the strength of a material from a mixture of inputs and how the material is prepared, or the time required to complete a path over a network. After we run our experiment, we use the observed performance to obtain an updated belief about the function, .
We may use derivative-free stochastic search because we do not have access to the derivative (or gradient) , or even a numerical approximation of the derivative. The most obvious examples arise when is a member of a discrete set , such as a set of drugs or materials, or perhaps different choices of websites. In addition, may be continuous, and yet we cannot even approximate a derivative. For example, we may want to test a drug dosage on a patient, but we can only do this by trying different ...
Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.