Grid search is a programmatic way to find the best set of values for hyperparameters in reinforcement learning. The performance of each set of hyperparameters is measured by the following three metrics:
- Average total reward over the first few episodes: We want to get the largest reward as early as possible.
- Average episode length over the first few episodes: We want the taxi to reach the destination as quickly as possible.
- Average reward for each time step over the first few episodes: We want to get the maximum reward as quickly as possible.
Let's go ahead and implement it:
- We herein use three alpha candidates [0.4, 0.5, and 0.6] and three epsilon candidates [0.1, 0.03, and 0.01], and only consider the first 500 episodes: ...