O'Reilly logo

Learn Unity ML-Agents - Fundamentals of Unity Machine Learning by Micheal Lanham

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Proximal policy optimization

Thus far, our discussion of RL has looked at simpler techniques for building agents with bandits and Q-learning. Q-learning is a popular algorithm, and as we learned, deep Q neural networks provide us with a great foundation to use to solve more difficult problems, such as a cart balancing a pole. The following table summarizes the various RL algorithms, what conditions they are capable of working in, and how they function:

Algorithm Model Policy Action Observation Operator
Q-Learning Model-free Off-policy Discrete Discrete Q value
SARSA – State Action Reward State Action Model-free On-policy Discrete Discrete Q value
DQN Deep Q Network Model-free Off-policy Discrete Continuous Q value
DDPG Deep ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required