Applying simple policies to a cartpole game

So far, we have randomly picked an action and applied it. Now let us apply some logic to picking the action instead of random chance. The third observation refers to the angle. If the angle is greater than zero, that means the pole is tilting right, thus we move the cart to the right (1). Otherwise, we move the cart to the left (0). Let us look at an example:

  1. We define two policy functions as follows:
def policy_logic(env,obs):    return 1 if obs[2] > 0 else 0def policy_random(env,obs):    return env.action_space.sample()
  1. Next, we define an experiment function that will run for a specific number of episodes; each episode runs until the game is lost, namely when done is True. We use rewards_max to indicate ...

Get Mastering TensorFlow 1.x now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.