book

Python Deep Learning - Second Edition

by Ivan Vasilev, Daniel Slater, Gianmario Spacagna, Peter Roelants, Valentino Zocca

January 2019

Intermediate to advanced

386 pages

11h 13m

English

Packt Publishing

Read now

Unlock full access

Content preview from Python Deep Learning - Second Edition

Control with Sarsa

Sarsa is an on-policy TD control method. Much such as MC control, we'll try to estimate the action-value function in order to find the optimal policy. We'll do this for the same reasons we outlined in the Exploring starts policy improvement section. But this time, we'll follow the blueprint outlined in the preceding section. That is, we'll iterate over multiple episodes and we'll update online, after each step of an episode. We can represent this process with a formula, similar to the one in the preceding section, with the exception that it is for the action-value function: