Q-Learning in Python
The environment and the Q-Learning discussed in the previous section can be implemented in Python. Since the policy is just a simple table, there is, at this point in time no need for Keras. Listing 9.3.1 shows
q-learning-9.3.1.py, the implementation of the simple deterministic world (environment, agent, action, and Q-Table algorithms) using the
QWorld class. For conciseness, the functions dealing with the user interface are not shown.
In this example, the environment dynamics is represented by
self.transition_table. At every action,
self.transition_table determines the next state. The reward for executing an action is stored in
self.reward_table. The two tables are consulted every time an action is executed by the