UCB1 belongs to the UCB family, and its contribution is in the selection of .

In UCB1, the UCB is computed by keeping track of the number of times an action, (), has been selected, along with , and the total number of actions that are selected with , as represented in the following formula:


The uncertainty of an action, is thus related ...

Get Reinforcement Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.