SARSA

State-Action-Reward-State-Action (SARSA) is an on-policy TD control algorithm. Like we did in Q learning, here we also focus on state-action value instead of a state-value pair. In SARSA, we update the Q value based on the following update rule:

In the preceding equation, you may notice that there is no max Q(s',a'), like there was in Q learning. Here it is simply Q(s',a'). We can understand this in detail by performing some steps. The steps involved in SARSA are as follows:

  1. First, we initialize the Q values to some arbitrary values
  2. We select an action by the epsilon-greedy policy () and move from one state to another
  3. We update the ...

Get Hands-On Reinforcement Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.