January 2018
Beginner to intermediate
284 pages
8h 35m
English
For classification problems (each sample only contains or relates to one class), mean squared error (MES/L2 loss) and cross-entropy loss are widely used. Also, softmax is often used for the last layer, and when numbers of classes are very large, one can choose hierarchical softmax. Hinge loss and squared hinge loss are also fine for this case. A side note is that, one should remember that softmax works as a squash function to assign probabilities (sum to one) to each of the classes, so the output value/probability of one class is not independent from other class probabilities.
Read now
Unlock full access