Understanding Activation Functions

By now, you’re familiar with activation functions—those cyan boxes in between a neural network’s layers shown in the diagram.


All our activation functions so far have been sigmoids, except in the output layer, where we used the softmax function.

The sigmoid has been with us for a long time. I originally introduced it to squash the output of a perceptron so that it ranged from 0 to 1. Later on, I introduced the softmax to rescale a neural network’s outputs so that they added up to 1. By rescaling the outputs, we could interpret them as probabilities, as in: “we have a 30% chance that this picture contains a platypus.” ...

Get Programming Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.