January 2020
Intermediate to advanced
432 pages
10h 18m
English
As you can see now, taking a step or update with TRPO is not trivial and things are still going to get more complicated. The step itself requires the agent to learn several factors from updating the policy and value function to also attain an advantage, also known as actor-critic. Understanding the actual details of the step function is beyond the scope of this book and you are again referred to those external references. However, it may be helpful to review what constitutes a step in TRPO and how this may compare complexity wise to other methods we look at in the future. Open up the sample TRPO folder again and follow the next exercise:
trpo_step(policy_net, ...
Read now
Unlock full access