An example of value iteration using the Bellman equation

Consider the following environment and the given information:

Given information:

  • A, C, and X are the names of some states.

  • The green-colored state is the goal state, G, with a reward of +1.

  • The red-colored state is the bad state, B, with a reward of -1, try to prevent your agent from entering this state

  • Thus, the green and red states are the terminal states, enter either and the game is over. If the agent encounters the green state, that is, the goal state, the agent wins, while if they enter the red state, then the agent loses the game.

  • ,  (that is, reward for all states except ...

Get Reinforcement Learning with TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.