Understanding Activation Functions
By now, you’re familiar with activation functions—those cyan boxes in between a neural network’s layers shown in the diagram.
All our activation functions so far have been sigmoids, except in the output layer, where we used the softmax function.
The sigmoid has been with us for a long time. I originally introduced it to squash the output of a perceptron so that it ranged from 0 to 1. Later on, I introduced the softmax to rescale a neural network’s outputs so that they added up to 1. By rescaling the outputs, we could interpret them as probabilities, as in: “we have a 30% chance that this picture contains a platypus.” ...