June 2018
Intermediate to advanced
546 pages
13h 30m
English
In the simplistic example we just saw, to calculate the values of states and actions, we have exploited the structure of the environment: we had no loops in transitions, so we could start from terminal states, calculate their values and then proceed to the central state. However, just one loop in the environment builds an obstacle in our approach. Let's consider such an environment with two states:
Figure 7: A sample environment with a loop in the transition diagram
We start from state
, and the only action we can take leads ...
Read now
Unlock full access