Understanding PPO

We have avoided going too deep into the more advanced inner workings of the proximal policy optimization (PPO) algorithm, even going so far as to avoid any policy-versus-model discussion. If you recall, PPO is the reduced level (RL) method first developed at OpenAI that powers ML-Agents, and is a policy-based algorithm. In this chapter, we will look at the differences between policy-and model-based RL algorithms, as well as the more advanced inner workings of the Unity implementation.

The following is a list of the main topics we will cover in this chapter:

  • Marathon reinforcement learning
  • The partially observable Markov decision process
  • Actor-Critic and continuous action spaces
  • Understanding TRPO and PPO
  • Tuning PPO with ...

Get Hands-On Deep Learning for Games now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.