Skip to Content
Numerical Computing with Python
book

Numerical Computing with Python

by Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim
December 2018
Beginner to intermediate
682 pages
18h 1m
English
Packt Publishing
Content preview from Numerical Computing with Python

TD prediction

Both TD and MC use experience to solve z prediction problem. Given some policy π, both methods update their estimate v of vπ  for the non-terminal states St occurring in that experience. Monte Carlo methods wait until the return following the visit is known, then use that return as a target for V(St).

The preceding method can be called as a constant - α MC, where MC must wait until the end of the episode to determine the increment to V(St) (only then is Gt known).

TD methods need to wait only until the next timestep. At time t+1, they immediately form a target and make a useful update using the observed reward Rt+1 and the estimate ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Numerical Computing with NumPy

Mastering Numerical Computing with NumPy

Umit Mert Cakmak, Tiago Antao, Mert Cuhadaroglu

Publisher Resources

ISBN: 9781789953633OtherOtherErrata Page