The following is a list of activation function properties that are worth considering when deciding which activation function to choose:
- Non-linearity: If the activation function is non-linear, it can be proved that even a two-level neural network can be a universal approximator of the function.
- Continuous differentiability: This property is desirable for providing gradient descent optimization methods.
- Value range: If the set of values for the activation function is limited, gradient-based learning methods are more stable and less prone to calculation errors since there are no large values. If the range of values is infinite, training is usually more effective, but care must be taken to avoid exploding ...