The multi-arm bandit problem

Let me start with an analogy to understand this topic better. Do you like pizza? I like it a lot! My favorite restaurant in Bangalore serves delicious pizzas. I go to this place almost every time I feel like eating a pizza, and I am almost sure that I will get the best pizza. However, going to the same restaurant every time worries me that I am missing out on pizzas that are even better and served elsewhere in the town!

One alternative available is to try out restaurants one by one and sample the pizzas there, but this means that there is a very high probability that I will end up eating pizzas that aren't very nice. However, this is the one way for me to find a restaurant that serves better pizzas than the one ...

Get Advanced Machine Learning with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.