April 2018
Intermediate to advanced
334 pages
10h 18m
English
The State–Action–Reward–State–Action (SARSA) algorithm is an on-policy learning problem. Just like Q-learning, SARSA is also a temporal difference learning problem, that is, it looks ahead at the next step in the episode to estimate future rewards. The major difference between SARSA and Q-learning is that the action having the maximum Q-value is not used to update the Q-value of the current state-action pair. Instead, the Q-value of the action as the result of the current policy, or owing to the exploration step like
-greedy is chosen to update the Q-value of the current state-action pair. The name SARSA comes from the fact ...