O'Reilly logo

Learn Unity ML-Agents - Fundamentals of Unity Machine Learning by Micheal Lanham

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Exploration and exploitation

One of the dilemmas we face in RL is the balance between exploring all possible actions and exploiting the best possible action. In the multi-armed bandit problem, our search space was small enough to do this with brute force, essentially just by pulling each arm one by one. However, in more complex problems, the number of states could exceed the number of atoms in the known universe. Yes, you read that correctly. In those cases, we need to establish a policy or method whereby we can balance the exploration and exploitation dilemma. There are a few ways in which we can do this, and the following are the most common ways you can approach this:

  • Greedy Optimistic: The agent initially starts with high values in its ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required