Chapter 8. Conclusion
Learning Life Lessons from Bandit Algorithms
In this book, we’ve presented three algorithms for solving the Multiarmed Bandit Problem:
- The epsilon-Greedy Algorithm
- The Softmax Algorithm
- The UCB Algorithm
In order to really take advantage of these three algorithms, you’ll need to develop a good intuition for how they’ll behave when you deploy them on a live website. Having an intuition about which algorithms will work in practice is important because there is no universal bandit algorithm that will always do the best job of optimizing a website: domain expertise and good judgment will always be necessary.
To help you develop the intuition and judgment you’ll need, we’ve advocated a Monte Carlo simulation framework that lets you see how these algorithms and others will behave in hypothetical worlds. By testing an algorithm in many different hypothetical worlds, you can build an appreciation for the qualitative dynamics that cause a bandit algorithm to succeed in one scenario and to fail in another.
In this last section, we’d like to help you further down that path by highlighting these qualitative patterns explicitly.
We’ll start off with some general life lessons that we think are exemplified by bandit algorithms, but actually apply to any situation you might ever find yourself in. Here are the most salient lessons:
- Trade-offs, trade-offs, trade-offs
- In the real world, you always have to trade off between gathering data and acting on that data. Pure experimentation ...