June 2018
Intermediate to advanced
318 pages
9h 24m
English
State-Action-Reward-State-Action (SARSA) is an on-policy TD control algorithm. Like we did in Q learning, here we also focus on state-action value instead of a state-value pair. In SARSA, we update the Q value based on the following update rule:

In the preceding equation, you may notice that there is no max Q(s',a'), like there was in Q learning. Here it is simply Q(s',a'). We can understand this in detail by performing some steps. The steps involved in SARSA are as follows: