December 2018
Beginner to intermediate
684 pages
21h 9m
English
The transition matrix defines the probability to end up in a certain state, S, for each previous state and action, A, P(s' | s, a). We will demonstrate pymdptoolbox, and use one of the formats that's available to us to specify transitions and rewards. For both transition probabilities, we will create NumPy array with dimensions of A x S x S.
First, we compute the target cell for each starting cell and move:
def get_new_cell(state, move): cell = to_2d(state) if actions[move] == 'U': return cell[0] - 1, cell[1] elif actions[move] == 'D': return cell[0] + 1, cell[1] elif actions[move] == 'R': return cell[0], cell[1] + 1 elif actions[move] == 'L': return cell[0], cell[1] - 1
The following function uses the argument's ...