Understanding Activation Functions

By now, you’re familiar with activation functions—those cyan boxes in between a neural network’s layers shown in the diagram.


All our activation functions so far have been sigmoids, except in the output layer, where we used the softmax function.

The sigmoid has been with us for a long time. I originally introduced it to squash the output of a perceptron so that it ranged from 0 to 1. Later on, I introduced the softmax to rescale a neural network’s outputs so that they added up to 1. By rescaling the outputs, we could interpret them as probabilities, as in: “we have a 30% chance that this picture contains a platypus.” ...

Get Programming Machine Learning now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.