**Softmax** normalizes or squashes a vector of arbitrary values to a probability distribution between 0 and 1. The sum of the softmax output will be equal to 1. Therefore, it is commonly used in the last layer of a neural network to predict probabilities of the possible output classes. The following is the mathematical expression for the softmax function for a vector with *j *values:

Here *z*_{j }represents the *jth* vector value and *K* represents the number of classes. As we can see the exponential function smoothens the output value while the denominator normalizes the final value between 0 and 1.