December 2018
Beginner to intermediate
682 pages
18h 1m
English
Markov decision process (MDP) formally describes an environment for reinforcement learning. Where:
Central idea of MDP: MDP works on the simple Markovian property of a state; for example, St+1 is entirely dependent on latest state St rather than any historic dependencies. In the following equation, the current state captures all the relevant information from the history, which means ...