Keras Reinforcement Learning Projects
by Giuseppe Ciaburro, Sudharsan Ravichandiran, Suriyadeepan Ramamoorthy
SARSA
As we anticipated in Chapter 1, Overview of Keras Reinforcement Learning, the State-Action-Reward-State-Action (SARSA) algorithm implements an on-policy TDs method, in which the update of the action value function (Q) is performed based on the results of the transition from the state s = s (t) to the state s' = s (t + 1) by the action a (t), taken on the basis of a selected policy: π (s, a).
There are policies that always choose the action that provides the maximum reward, and non-deterministic policies (ε-greedy, ε-soft, and softmax), which ensure an element of exploration in the learning phase.
In SARSA, it is necessary to estimate the action-value function q(s, a), because the total value of a state v(s) (value function) is not sufficient ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access