O'Reilly logo

Learn Unity ML-Agents - Fundamentals of Unity Machine Learning by Micheal Lanham

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Contextual bandits and state

Our next step in understanding RL will be for us to look at the contextual bandit problem. A contextual bandit is the multi-armed bandit problem, with multiple bandits each producing different rewards. This type of problem has many applications in online advertising, where each user is thought of as a different bandit, with the goal being to present the best advertisement for that user. To model the context of the bandit, and which bandit it is, we add the concept of state. Where we now interpret state to represent each of our different bandits. The following diagram shows the addition of state in the Contextual Bandit problem and where it lies on our path to glory:

Stateless, Contextual and Full RL models

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required