As we already know, the output value of the i-th neuron in a layer is computed as follows:
The activation function, , is important for several reasons:
- As stated in the previous section, depending on the layer we are applying the non-linearity to, it allows us to interpret the result of the neural network.
- If the input data is not linearly separable, it's non-linearity allows you to approximate a non-linear function that's capable of separating data in a non-linear way (just think about the transformation of a hyperplane into a generic hypersurface).
- Without non-linearities among adjacent layers, multi-layer neural networks ...