book

Numerical Computing with Python

by Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim

December 2018

Beginner to intermediate

682 pages

18h 1m

English

Packt Publishing

Read now

Unlock full access

Content preview from Numerical Computing with Python

SARSA on-policy TD control

State-action-reward-state-action (SARSA) is an on-policy TD control problem, in which policy will be optimized using policy iteration (GPI), only time TD methods used for evaluation of predicted policy. In the first step, the algorithm learns a SARSA function. In particular, for an on-policy method we estimate q_π (s, a) for the current behavior policy π and for all states (s) and actions (a), using the TD method for learning v_π.

Now, we consider transitions from state-action pair to state-action pair, and learn the values of state-action pairs:

This update is done after every transition from a non-terminal state ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Mastering Numerical Computing with NumPy

Publisher Resources

ISBN: 9781789953633Other Other Errata Page

Numerical Computing with Python

by Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim

SARSA on-policy TD control

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Mastering Numerical Computing with NumPy

Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib

Python Machine Learning Cookbook - Second Edition

Hands-On Deep Learning Algorithms with Python

Publisher Resources