October 2019
Intermediate to advanced
366 pages
12h 4m
English
Let's test ME-TRPO on RoboSchoolInvertedPendulum, a continuous inverted pendulum environment similar to the well-known discrete control counterpart, CartPole. A screenshot of RoboSchoolInvertedPendulum-v1 is shown here:

The goal is to keep the pole upright by moving the cart. A reward of +1 is obtained for every step that the pole points upward.
Considering that ME-TRPO needs the reward function and, consequently, a done function, we have to define both for this task. To this end, we defined pendulum_reward, which returns 1 no matter what the observation and actions are:
def pendulum_reward(ob, ac): return
Read now
Unlock full access