book

Python Deep Learning - Second Edition

by Ivan Vasilev, Daniel Slater, Gianmario Spacagna, Peter Roelants, Valentino Zocca

January 2019

Intermediate to advanced

386 pages

11h 13m

English

Packt Publishing

Read now

Unlock full access

Content preview from Python Deep Learning - Second Edition

Double Q-learning

Imagine that the majority of the actions, a, starting from state, s, have true action-values, . That is, the real return for each action starting from the s state is 0. Unfortunately, we don't know the true action-values and instead we try to estimate them, hoping that our estimation will eventually converge toward the optimum. Our estimations, , are uncertain – some estimations might be slightly above 0, while others might be slightly below. And here comes the issue. When we compute the estimation of each state/action pair ...