The **randomized probability matching bandit** algorithm is a Bayesian statistical approach to the problem of figuring out when to explore our options and when to exploit them for a nice payoff. It works by sampling a probability distribution that describes the probable mean of the payoff. As we gain more data, the variance of the possible means narrows significantly. As we go through the rest of this chapter, we'll delve deeper into how this algorithm works.

As a concrete example, we can run some simulations. The following is a histogram of repeatedly sampling means with 100 samples from a normal distribution:

plt.title('Distribution of means for N(35,5) distribution (sampling 100 vs 500 data points)') plt.xlabel('') ...

Start Free Trial

No credit card required