Skip to Content
Python Deep Learning - Second Edition
book

Python Deep Learning - Second Edition

by Ivan Vasilev, Daniel Slater, Gianmario Spacagna, Peter Roelants, Valentino Zocca
January 2019
Intermediate to advanced
386 pages
11h 13m
English
Packt Publishing
Content preview from Python Deep Learning - Second Edition

Epsilon-greedy policy improvement

In the preceding section, we discussed that if we follow a deterministic policy (DP), we might not reach all state/action pairs. This would undermine our efforts to estimate the action-value function. We solved this problem with the exploring-starts assumption. But this assumption is unusual and it would be best to avoid it. In fact, the core of our problem is that we follow the policy blindly, which prevents us from exploring all possible state/action pairs. Can we solve this by introducing a different policy? Turns out it can (surprise!). In this section, we'll introduce MC control with a non-deterministic epsilon-greedy (ε-greedy) policy. The core idea is simple. Most of the time the ε-greedy policy behaves ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Deep Learning

Python Deep Learning

Valentino Zocca, Gianmario Spacagna, Daniel Slater, Peter Roelants

Publisher Resources

ISBN: 9781789348460Supplemental Content