book

Bandit Algorithms for Website Optimization

Name: Bandit Algorithms for Website Optimization
Author: John Myles White
ISBN: 9781449341336

by John Myles White

December 2012

Intermediate to advanced

88 pages

1h 58m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Bandit Algorithms for Website Optimization
Preface
Finding the Code for This BookDealing with Jargon: A GlossaryConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
1. Two Characters: Exploration and Exploitation
The Scientist and the BusinessmanCynthia the ScientistBob the BusinessmanOscar the Operations ResearcherThe Explore-Exploit Dilemma
2. Why Use Multiarmed Bandit Algorithms?
What Are We Trying to Do?The Business Scientist: Web-Scale A/B Testing
3. The epsilon-Greedy Algorithm
Introducing the epsilon-Greedy AlgorithmDescribing Our Logo-Choosing Problem AbstractlyWhat’s an Arm?What’s a Reward?What’s a Bandit Problem?Implementing the epsilon-Greedy AlgorithmThinking Critically about the epsilon-Greedy Algorithm
4. Debugging Bandit Algorithms
Monte Carlo Simulations Are Like Unit Tests for Bandit AlgorithmsSimulating the Arms of a Bandit ProblemAnalyzing Results from a Monte Carlo StudyApproach 1: Track the Probability of Choosing the Best ArmApproach 2: Track the Average Reward at Each Point in TimeApproach 3: Track the Cumulative Reward at Each Point in TimeExercises
5. The Softmax Algorithm
Introducing the Softmax AlgorithmImplementing the Softmax AlgorithmMeasuring the Performance of the Softmax AlgorithmThe Annealing Softmax AlgorithmExercises
6. UCB – The Upper Confidence Bound Algorithm
Introducing the UCB AlgorithmImplementing UCBComparing Bandit Algorithms Side-by-SideExercises
7. Bandits in the Real World: Complexity and Complications
A/A TestingRunning Concurrent ExperimentsContinuous Experimentation vs. Periodic TestingBad Metrics of SuccessScaling Problems with Good Metrics of SuccessIntelligent Initialization of ValuesRunning Better SimulationsMoving WorldsCorrelated BanditsContextual BanditsImplementing Bandit Algorithms at Scale
8. Conclusion
Learning Life Lessons from Bandit AlgorithmsA Taxonomy of Bandit AlgorithmsLearning More and Other Topics

Colophon
Copyright

Content preview from Bandit Algorithms for Website Optimization

Chapter 5. The Softmax Algorithm

Introducing the Softmax Algorithm

If you’ve completed the exercises for Chapter 2, you should have discovered that there’s an obvious problem with the epsilon-Greedy algorithm: it explores options completely at random without any concern about their merits. For example, in one scenario (call it Scenario A), you might have two arms, one of which rewards you 10% of the time and the other rewards you 13% of the time. In Scenario B, the two arms might reward you 10% of the time and 99% of the time. In both of these scenarios, the probability that the epsilon-Greedy algorithm explores the worse arm is exactly the same (it’s epsilon / 2), despite the inferior arm in Scenario B being, in relative terms, much worse than the inferior arm in Scenario A.

This is a problem for several reasons:

If the difference in reward rates between two arms is small, you’ll need to explore a lot more often than 10% of the time to correctly determine which of the two options is actually better.
In contrast, if the difference is large, you need to explore a lot less than 10% of the time to correctly estimate the better of the two options. For that reason, you’ll end up losing a lot of reward by exploring an unambiguously inferior option in this case. When we first described the epsilon-Greedy algorithm, we said that we wouldn’t set epsilon = 1.0 precisely so that we wouldn’t waste time on inferior options, but, if the difference between two arms is large enough, we end up wasting ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Building Recommender Systems with Machine Learning and AI

Publisher Resources

ISBN: 9781449341565Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Bandit Algorithms for Website Optimization

by John Myles White

Chapter 5. The Softmax Algorithm

Introducing the Softmax Algorithm

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.