O'Reilly logo

Hands-On Machine Learning with C# by Matt R. Cole

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

SARSA

SARSA (you can already guess where this one is, going by the name) works like this:

  1. The agent starts at state 1
  2. It then performs action 1 and gets reward 1
  3. Next, it moves on to state 2, performs action 2, and gets reward 2
  4. Then, the agent goes back and updates the value of action 1

As you can see, the difference in the two algorithms is in the way the future reward is found. Q-learning uses the highest action possible from state 2, while SARSA uses the value of the action that is actually taken.

Here is the mathematical intuition for SARSA:

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required