Following are the explanations of the functions:
- Threshold activation function was used by McCulloch Pitts Neuron and initial Perceptrons. It is not differentiable and is discontinuous at x=0. Therefore, it is not possible to use this activation function to train using gradient descent or its variants.
- Sigmoid activation function was very popular at one time. If you look at the curve, it looks like a continuous version of the threshold activation function. It suffers from the vanishing gradient problem, that is, the gradient of the function becomes zero near the two edges. This makes training and optimization difficult.
- Hyperbolic Tangent activation function is again sigmoidal in shape and has nonlinear properties. The function ...