3. SARSA

In this chapter we look at SARSA, our first value-based algorithm. It was invented by Rummery and Niranjan in their 1994 paper “On-Line Q-Learning Using Connectionist Systems” [118] and was given its name because “you need to know State-Action-Reward-State-Action before performing an update.”1

1. SARSA was not actually called SARSA by Rummery and Niranjan in their 1994 paper “On-Line Q-Learning Using Connectionist Systems” [118]. The authors preferred “Modified Connectionist Q-Learning.” The alternative was suggested by Richard Sutton and it appears that SARSA stuck.

Value-based algorithms evaluate state-action pairs (s, a) by learning one of the value functions—Vπ(s) or Qπ(s, a)—and use these evaluations to select actions. Learning ...

Get Foundations of Deep Reinforcement Learning: Theory and Practice in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.