How can we deal with cases where it is unable to linearly segregate a set of observations containing outliers? We can actually allow misclassification of such outliers and try to minimize the error introduced. The misclassification error (also called hinge loss) for a sample can be expressed as follows:
Together with the ultimate term ‖w‖ to reduce, the final objective value we want to minimize becomes the following: ...