October 2019
Intermediate to advanced
366 pages
12h 4m
English
Put simply, the multi-armed bandit problem, and in general every exploration problem, can be solved either through random strategies, or through smarter techniques. The most notorious algorithm that belongs to the first category, is called
-greedy; whereas optimistic exploration, such as UCB, and posterior exploration, such as Thompson sampling, belong to the second category. In this section, we'll take a look particularly at the
-greedy and UCB strategies.
It's all about balancing the risk and the reward. But, how ...
Read now
Unlock full access