December 2018
Beginner to intermediate
684 pages
21h 9m
English
The output layer compresses the three-dimensional hidden layer activations, H, back to two dimensions using a 3 x 2 weight matrix, Wo, and a two-dimensional bias vector, bo, as follows:

The linear combination of the hidden layer outputs results in an N x 2 matrix, Zo, as follows:

The output layer activations are computed by the softmax function, ς, which normalizes the Zo to conform to the conventions used for discrete probability distributions, as follows:
We define a softmax function in Python as follows:
def softmax(z): ...