Skip to Content
Numerical Computing with Python
book

Numerical Computing with Python

by Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim
December 2018
Beginner to intermediate
682 pages
18h 1m
English
Packt Publishing
Content preview from Numerical Computing with Python

Q-learning - off-policy TD control

Q-learning is the most popular method used in practical applications for many reinforcement learning problems. The off-policy TD control algorithm is known as Q-learning. In this case, the learned action-value function, Q directly approximates , the optimal action-value function, independent of the policy being followed. This approximation simplifies the analysis of the algorithm and enables early convergence proofs. The policy still has an effect, in that it determines which state-action pairs are visited and updated. However, all that is required for correct convergence is that all pairs continue to be updated. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Numerical Computing with NumPy

Mastering Numerical Computing with NumPy

Umit Mert Cakmak, Tiago Antao, Mert Cuhadaroglu

Publisher Resources

ISBN: 9781789953633OtherOtherErrata Page