Neural networks are applied to common supervised learning problems and, hence, use familiar output representations of the final hidden layer activations:
- Linear output units compute an affine transformation from the hidden layer activations and are common for regression problems in conjunction with MSE cost.
- Sigmoid output units model a Bernoulli distribution, just like logistic regression, with hidden activations as input.
- Softmax units generalize the logistic sigmoid and model a discrete distribution over more than two classes as demonstrated precedingly.