MDP

The problem of reinforcement learning can be formulated as the MDP. The MDP is the mathematical notion to define the mutual interaction between states, actions, and rewards in the environment. It contains the following factors:

  • Set of states S
  • Space of actions A(s)
  • Initial states
  • State transition function
  • Reward function

The MDP is a probabilistic process described by these factors. S contains all possible states in the process. A(s)

Get Hands-On Machine Learning with TensorFlow.js now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.