Chapter 14. Multiarmed Bandits

This chapter is the first installment in our description of the reinforcement learning technique. In the context of a problem with multiple solutions, multiarmed bandit techniques attempt to acquire behavioral knowledge on many solutions (exploration) while at the same time applying the most rewarding solution (exploitation) to maximize success. The balancing act between experimenting and acquiring new knowledge and leveraging previously acquired knowledge is the core concept behind multiarmed bandit techniques.

This chapter covers the following topics:

Exploration versus exploitation trade-off
Minimization of cumulative regret
Epsilon-greedy algorithm
Upper confidence bound technique
Context free Thompson sampling

K-armed ...

Get Scala for Machine Learning - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Scala for Machine Learning - Second Edition by Patrick R. Nicolas

Chapter 14. Multiarmed Bandits

K-armed ...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly