Chapter 14. Multiarmed Bandits

This chapter is the first installment in our description of the reinforcement learning technique. In the context of a problem with multiple solutions, multiarmed bandit techniques attempt to acquire behavioral knowledge on many solutions (exploration) while at the same time applying the most rewarding solution (exploitation) to maximize success. The balancing act between experimenting and acquiring new knowledge and leveraging previously acquired knowledge is the core concept behind multiarmed bandit techniques.

This chapter covers the following topics:

  • Exploration versus exploitation trade-off
  • Minimization of cumulative regret
  • Epsilon-greedy algorithm
  • Upper confidence bound technique
  • Context free Thompson sampling

K-armed ...

Get Scala for Machine Learning - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.