There are various dueling policies available:
- Eps greedy policy: The eps greedy policy either takes a random action with the probability epsilon or takes the current best action with prob (1 - epsilon).
- Softmax policy: The softmax policy takes action according to the probability distribution.
- Linear annealed policy: The linear annealed policy computes a current threshold value and transfers it to an inner policy, which chooses the action. The threshold value follows a linear function, decreasing over time.