May 2018
Beginner
490 pages
13h 16m
English
In the last part of Reinforcement_Learning_Q_function.py in the first chapter, a range of 50,000 is implemented.
The idea is to set the number of episodes at such a level that convergence is certain. In the following code, the range (50000) is a constant.
for i in range(50000): current_state = ql.random.randint(0, int(Q.shape[0])) PossibleAction = possible_actions(current_state) action = ActionChoice(PossibleAction) reward(current_state,action,gamma)
Convergence, in this case, will be defined as the point at which no matter how long you run the system, the Q result matrix will not change anymore.
By setting the range to 50000, you can test and verify this. As long as the reward matrices remain homogeneous, this will work. ...
Read now
Unlock full access