Q-Learning example

To illustrate the Q-Learning algorithm, we need to consider a simple deterministic environment, as shown in the following figure. The environment has six states. The rewards for allowed transitions are shown. The reward is non-zero in two cases. Transition to the Goal (G) state has +100 reward while moving into Hole (H) state has -100 reward. These two states are terminal states and constitute the end of one episode from the Start state:

Q-Learning example

Figure 9.3.1: Rewards in a simple deterministic world

To formalize the identity of each state, we need to use a (row, column) identifier as shown in the following figure. Since the agent has not ...

Get Advanced Deep Learning with Keras now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.