February 2020
Intermediate to advanced
328 pages
8h 19m
English
In step 1, we defined action probability matrices for each action. We can interpret this as the probability of transitioning from a current state to the next state using an action. Let's say that if the agent is in state 2 and tries to go LEFT, there is a 90% probability that the agent will transition to state 1.
The following image represents the transition probability matrix for the LEFT action:

In step 2, we defined a reward matrix; that is, a scalar reward that's given to an agent for transitioning from the current state to the next one.
The following image represents the reward matrix:
In the last step, we solved the ...
Read now
Unlock full access