October 2018
Intermediate to advanced
368 pages
9h 20m
English
The environment and the Q-Learning discussed in the previous section can be implemented in Python. Since the policy is just a simple table, there is, at this point in time no need for Keras. Listing 9.3.1 shows q-learning-9.3.1.py, the implementation of the simple deterministic world (environment, agent, action, and Q-Table algorithms) using the QWorld class. For conciseness, the functions dealing with the user interface are not shown.
In this example, the environment dynamics is represented by self.transition_table. At every action, self.transition_table determines the next state. The reward for executing an action is stored in self.reward_table. The two tables are consulted every time an action is executed by the step()