Deep Reinforcement Learning Hands-On
by Oleg Vasilev, Maxim Lapan, Martijn van Otterlo, Mikhail Yurushkin, Basem O. F. Alijla
Q-learning for FrozenLake
The whole example is in the Chapter05/02_frozenlake_q_learning.py file, and the difference is really minor. The most obvious change is to our value table. In the previous example, we kept the value of the state, so the key in the dictionary was just a state. Now we need to store values of the Q-function, which has two parameters: state and action, so the key in the value table is now a composite.
The second difference is in our calc_action_value function. We just don't need it anymore, as our action values are stored in the value table. Finally, the most important change in the code is in the agent's value_iteration method. Before, it was just a wrapper around the calc_action_value call, which did the job of Bellman approximation. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access