We start off with probably the most basic classifier, the logistic regression, to be specific multinomial logistic regression as it is a multiclass case. It is a probabilistic linear classifier parameterized by a weight matrix *W* (also called coefficient matrix) and a bias (also called intercept) vector *b*. And it maps an input vector *x* to a set of probabilities *P(y=1), P(y=2),. . ., P(y-K)* for *K* possible classes.

A multinomial logistic regression for two possible classes can be represented graphically as follows:

Suppose *x* is *n*-dimension, then the weight matrix *W* is of size *n* by *K* with each column *W _{k}* representing ...