December 2018
Beginner to intermediate
684 pages
21h 9m
English
We will begin by defining the environment parameters:
grid_size = (3, 4)blocked_cell = (1, 1)baseline_reward = -0.02absorbing_cells = {(0, 3): 1, (1, 3): -1}actions = ['L', 'U', 'R', 'D']num_actions = len(actions)probs = [.1, .8, .1, 0]
We will frequently need to convert between one-dimensional and two-dimensional representations, so we will define two helper functions for this purpose; states are one-dimensional and cells are the corresponding two-dimensional coordinates:
to_1d = lambda x: np.ravel_multi_index(x, grid_size)to_2d = lambda x: np.unravel_index(x, grid_size)
Furthermore, we will precompute some data points to make the code more concise:
num_states = np.product(grid_size)cells = list(np.ndindex(grid_size)) ...