ESes are an interesting alternative to RL. Nonetheless, the pros and cons must be evaluated so that we can pick the correct approach. Let's briefly look at the main advantages of ES:
- Derivative-free methods: There's no need for backpropagation. Only the forward pass is performed for estimating the fitness function (or equivalently, the cumulative reward). This opens the door to all the non-differentiable functions, for example; hard attention mechanisms. Moreover, by avoiding backpropagation, the code gains efficiency and speed.
- Very general: The generality of ES is mainly due to its property of being a black-box optimization method. Because we don't care about the agent, the actions that it performs, or the states visited, ...