May 2018
Beginner
490 pages
13h 16m
English
The MDP reward matrix (see Chapter 1, Become an Adaptive Thinker) is set to zero except for the values representing the edges of the graph that can be physically accessed; these are set to one, a small neutral value.
As it is, the MDP cannot provide a satisfactory result beyond reproducing the structure of this undirected graph. The reward matrix is now initialized and duplicated as shown in the following code, starting from line 41; at line 43, R, the reward matrix is built, and at line 50, Ri, a duplicate of R is created:
41:# R is The Reward Matrix for each state built on the physical graph# Ri is a memory of this initial state: no rewards and undirected
43:R = ql.matrix([ [0,0,0,0,1,0],[0,0,0,1,0,1], ...
Read now
Unlock full access