Keras Reinforcement Learning Projects
by Giuseppe Ciaburro, Sudharsan Ravichandiran, Suriyadeepan Ramamoorthy
Summary
In this chapter, TD learning algorithms were introduced. TD learning algorithms are based on reducing the differences between estimates made by the agent at different times. The SARSA algorithm implements an on-policy TDs method, in which the update of the action value function (Q) is performed based on the results of the transition from the state s = s (t) to the state s' = s (t + 1) by the action a (t), taken on the basis of a selected policy π (s, a). Q-learning, unlike SARSA, has off-policy characteristics, that is, while the policy is improved according to the values estimated by q(s, a), the value function updates the estimates following a strictly greedy secondary policy: given a state, the chosen action is always the one that ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access