A Markov decision process is a framework used to represent the environment of a reinforcement learning problem. It is a graphical model with directed edges (meaning that one node of the graph points to another node). Each node represents a possible state in the environment, and each edge pointing out of a state represents an action that can be taken in the given state. For example, consider the following MDP:

Figure 5: A sample Markov Decision Process

The preceding MDP represents what a typical day of a programmer could look like. Each circle represents a particular state the programmer can be in, where the blue ...