Multi-armed bandits

A/B testing is a great tool for evaluating some ideas. But sometimes there is no better model, for one particular case sometimes one is better, and sometimes another is better. To select the one which is better at this particular moment we can use on-line learning.

We can formulate this problem as a Reinforcement Learning problem--we have the agents (our search engine and the rankers), they interact with the environment (the users of the search engine), and get some reward (clicks). Then our systems learn from the interaction by taking actions (selecting the ranker), observing the feedback and selecting the best strategy based on it.

If we try to formulate A/B tests in this framework, then the action of the A/B test is ...

Get Java: Data Science Made Easy now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.