O'Reilly logo

Learn Unity ML-Agents - Fundamentals of Unity Machine Learning by Micheal Lanham

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

MDP and the Bellman equation

If you have studied Reinforcement Learning previously, you may have already come across the term MDP, for the Markov Decision Process, and the Bellman equation. An MDP is defined as a discrete time stochastic control (https://en.wikipedia.org/wiki/Stochasticprocess, but, put more simply, it is any process that makes decisions based on some amount of uncertainty combined with mathematics. This rather vague description still fits well with how we have been using RL to make decisions. In fact, we have been developing MDP processes all of this chapter, and you should be fairly comfortable with the concept now.

Up until now, we have only modeled the partial RL or one-step problem. Our observation of state was only ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required