Reinforcement learning is an iterative interaction between an agent and the environment. The following occurs at each timestep:
- The process is in a state and the decision-maker may choose any action that is available in that state
- The process responds at the next timestep by randomly moving into a new state and giving the decision-maker a corresponding reward
- The probability that the process moves into its new state is influenced by the chosen action in the form of a state transition function