We have already expanded the ideas behind the -greedy strategy and implemented it to help our exploration in algorithms such as Q-learning and DQN. It is a very simple approach, and yet it achieves very high performance in non-trivial jobs as well. This is the main reason behind its widespread use in many deep learning algorithms.
To refresh your memory, -greedy takes the best action most of the time, but from time to time, it selects a random action. The probability of choosing a random action is dictated by the value, ...