At the heart of RL is the Markov decision process (MDP). An MDP is often described as a discrete time stochastic control process. In simpler terms, this just means it is a control program that functions by time steps to determine the probability of actions, provided each action leads to a reward. This process is already used for most automation control of robotics, drones, networking, and of course RL. The classic way we picture this process is shown in the following diagram:
Where represent an MDP as a tuple or vector , using the following variables:
- - being a ...