Thompson sampling

Thompson sampling is a simple strategy, introduced 80 years ago, that has received renewed attention in recent years. It is wildly used in advertising displays, marketing surveys, and financial analysis. Thompson sampling is also a Bayesian strategy, known as probability matching: The probability of selecting the arm n is the probability that n is the arm with the maximum reward [14:4].

The strategy can be summarized as:

Assign a uniform distribution for each arm, prior to the selection
Select arm n with a posterior probability that increases with the probability that n is optimal (probability matching)

Bandit context

So far, we have discussed K-armed bandits that do not maintain a state or context. It is assumed that all the arms ...

Get Scala for Machine Learning - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Scala for Machine Learning - Second Edition by Patrick R. Nicolas

Thompson sampling

Bandit context

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly