Chapter 3: Contextual Bandits
A more advanced version of the multi-armed bandit is the contextual bandit (CB) problem, where decisions are tailored to the context they are made in. In the previous chapter, we identified the best performing ad in an online advertising scenario. In doing so, we did not use any information about, for instance, the user persona, age, gender, location, previous visits etc., which would have increased the likelihood of a click. Contextual bandits allow us to leverage such information, which makes them play a central role in commercial personalization and recommendation applications.
Context is similar to a state in a multi-step reinforcement learning (RL) problem, with one key difference. In a multi-step RL problem, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access