How it works...
In step 1, we defined the possible set of states and actions for this problem. To work with a model-free RL, we need to create a function that mimics the behavior of the environment. In step 2, we formulated the problem by creating a function called gridExampleEnvironment(), which takes a state-action pair as input and generates a list of the next state and the associated reward. In step 3, we used the sampleExperience() function to generate dynamic state-action transition tuples by querying the environment we created in the preceding step. The input arguments to this function are the number of samples, the environment function, and the set of states and actions. This function returns a dataframe that contains the experienced ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access