October 2019
Intermediate to advanced
366 pages
12h 4m
English
We saw how UCB, and in particular UCB1, can reduce the overall regret and accomplish an optimal convergence on the multi-armed bandit problem with a relatively easy algorithm. However, this is a simple stateless task.
So, how will UCB perform on more complex tasks? To answer this question, we can oversimplify the division and group all of the problems in these three main categories:
Read now
Unlock full access