We solve the multi-armed bandit problem using the epsilon-greedy policy as follows:
- Import the PyTorch and the bandit environment we developed in the previous recipe, Creating a multi-armed bandit environment (assuming the BanditEnv class is in a file called multi_armed_bandit.py):
>>> import torch >>> from multi_armed_bandit import BanditEnv
- Define the payout probabilities and rewards for the three-armed bandit and create an instance of the bandit environment:
>>> bandit_payout = [0.1, 0.15, 0.3] >>> bandit_reward = [4, 3, 1] >>> bandit_env = BanditEnv(bandit_payout, bandit_reward)
- We specify the number of episodes to run and define the lists holding the total rewards accumulated by choosing individual arms, the number ...