December 2018
Beginner to intermediate
684 pages
21h 9m
English
We can also solve MDPs using the pymdptoolbox Python library, which includes a few more algorithms, including Q-learning.
To run ValueIteration, just instantiate the corresponding object with the desired configuration options and the rewards and transition matrices before calling the .run() method:
vi = mdp.ValueIteration(transitions=transitions, reward=rewards, discount=gamma, epsilon=epsilon)vi.run()
The value function estimate matches the result in the previous section:
np.allclose(V.reshape(grid_size), np.asarray(vi.V).reshape(grid_size))
The PolicyIteration function works similarly:
pi = mdp.PolicyIteration(transitions=transitions, reward=rewards, discount=gamma, max_iter=1000)pi.run()
It also yields the ...