12Invariant Description for a Batch Version of the UCB Strategy with Unknown Control Horizon

We consider a batch processing variation of the UCB strategy for multi-armed bandits with a priori unknown control horizon size n. Random rewards of the considered multi-armed bandits can have a wide range of distributions with finite variances. We consider a batch processing scenario, when the arm of the bandit can be switched only after it was used a fixed number of times, and parallel processing is also possible in this case. A case of close distributions, when expected rewards differ by the magnitude of order N^–1/2 for some fairly large N, is considered as it yields the highest normalized regret. Invariant descriptions are obtained for upper bounds in the strategy and for minimax regret. We perform a series of Monte Carlo simulations to find the estimations of the minimax regret for multi-armed bandits. The maximum for regret is reached for n proportional to N, as expected based on obtained descriptions.

12.1. Introduction

The multi-armed bandit (MAB) problem is a control problem in a random environment. Traditionally, it is pictured as a slot machine that has j (two or more) arms (levers). Choosing an arm at time n yields some random rewards (income) ξ(n) associated with it. The gambler (decision-making agent) begins with no initial knowledge about rewards associated with the arms.

The goal is to maximize the total expected reward.

A classic reinforcement learning problem that ...

Get Data Analysis and Related Applications, Volume 1 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Analysis and Related Applications, Volume 1 by Konstantinos N. Zafeiris, Christos H. Skiadas, Yiannis Dimotikalis, Alex Karagrigoriou, Christiana Karagrigoriou-Vonta

12Invariant Description for a Batch Version of the UCB Strategy with Unknown Control Horizon

12.1. Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly