Skip to Main Content
Hands-On Reinforcement Learning with Python
book

Hands-On Reinforcement Learning with Python

by Sudharsan Ravichandiran
June 2018
Intermediate to advanced content levelIntermediate to advanced
318 pages
9h 24m
English
Packt Publishing
Content preview from Hands-On Reinforcement Learning with Python

The upper confidence bound algorithm

With epsilon-greedy and softmax exploration, we explore random actions with a probability; the random action is useful for exploring various arms, but it might also lead us to try out actions that will not give us a good reward at all. We also don't want to miss out arms that are actually good but give poor rewards in the initial rounds. So we use a new algorithm called the upper confidence bound (UCB). It is based on the principle called optimism in the face of uncertainty.

The UCB algorithm helps us in selecting the best arm based on a confidence interval. Okay, what is a confidence interval? Let us say we have two arms. We pull both of these arms and find that arm one gives us 0.3 rewards and arm two ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced Deep Learning with Python

Advanced Deep Learning with Python

Ivan Vasilev

Publisher Resources

ISBN: 9781788836524Supplemental Content