Chapter 2: Multi-Armed Bandits
When you log on to your favorite social media app, chances are you see one of the many versions of the app that are tested at that time. When you visit a website, the ads displayed to you are tailored to your profile. In many online shopping platforms, the prices are determined dynamically. Do you know what all these have in common? They are often modeled as multi-armed bandit (MAB) problems to identify optimal decisions. A MAB problem is a form of reinforcement learning (RL), where the agent makes decisions in a problem horizon that consists of a single step. Therefore, the goal is to maximize only the immediate reward, and there are no consequences considered for any subsequent steps. While this is a simplification ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access