The Bellman equation of optimality
To explain the Bellman equation, it's better to go a bit abstract. Don't be afraid, I'll provide the concrete examples later to support your intuition! Let's start with a deterministic case, when all our actions have a 100% guaranteed outcome. Imagine that our agent observes state and has N available actions. Every action leads to another state, , with a respective reward, . Also assume that we know the values,
Get Deep Reinforcement Learning Hands-On now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.