Summary
In this chapter, we learned how PG methods are not without their own faults and looked at ways to fix or correct them. This led us to explore more implementation methods that improved sampling efficiency and optimized the objective or clipped gradient function. We did this by looking at the PPO method, which uses clipped objective functions to optimize the region of trust we use to calculate the gradient. After that, we looked at adding a new network layer configuration to understand the context in state.
Then, we used the new layer type, an LSTM layer, on top of PPO to see the improvements it generated. Then, we looked at improving sampling using parallel environments and synchronous or asynchronous workers. We did this by implementing ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access