8. The Multi-Armed Bandit Problem


In this chapter, we will introduce the popular Multi-Armed Bandit problem and some common algorithms used to solve it. We will learn how to implement some of these algorithms, such as Epsilon Greedy, Upper Confidence Bound, and Thompson Sampling, in Python via an interactive example. We will also learn about contextual bandits as an extension of the general Multi-Armed Bandit problem. By the end of this chapter, you will have a deep understanding of the general Multi-Armed Bandit problem and the skill to apply some common ways to solve it.


In the previous chapter, we discussed the technique of temporal difference learning, a popular model-free reinforcement learning algorithm that ...

Get The Reinforcement Learning Workshop now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.