May 2018
Beginner
490 pages
13h 16m
English
The reinforcement program we studied contains no trace of a specific field, as in traditional software. The program contains Bellman's equation with stochastic (random) choices based on the reward matrix. The goal is to find a route to C (line 3, column 3), which has an attractive reward (100):
# Markov Decision Process (MDP) - Bellman's equations adapted to# Reinforcement Learning with the Q action-value(reward) matrix# R is The Reward Matrix for each stateR = ql.matrix([ [0,0,0,0,1,0], [0,0,0,1,0,1], [0,0,100,1,0,0], [0,1,1,0,1,0], [1,0,0,1,0,0], [0,1,0,0,0,0] ])
That reward matrix goes through Bellman's equation and produces a result in Python:
Q :[[ 0. 0. 0. 0. 258.44 0. ] [ 0. 0. 0. 321.8 0. 207.752] [ 0. 0. 500. ...
Read now
Unlock full access