January 2020
Intermediate to advanced
432 pages
10h 18m
English
An RL problem fulfills the Markov property if all Markov signals/states predict a future state. Subsequently, a Markov signal or state is considered a Markov property if it enables the agent to predict values from that state. Likewise, a learning task that is a Markov property and is finite is called a finite Markov decision process, or MDP. A very classic example of an MDP used to often explain RL is shown here:
The Markov decision process (Dr. David Silver)Read now
Unlock full access