Even if called regression, this is a classification method that is based on the probability of a sample belonging to a class. As our probabilities must be continuous in ℜ and bounded between (0, 1), it's necessary to introduce a threshold function to filter the term z. As already done with linear regression, we can get rid of the extra parameter corresponding to the intercept by adding a 1 element at the end of each input vector:
In this way, we can consider a single parameter vector θ, containing m + 1 elements, and compute the z-value with a dot product:
Now, let's suppose we introduce the probability p(xi) that an element ...