♣ 22 ♣Classification Models

Classification models do not try to predict a continuous variable, but rather try to predict if an observation belongs to a certain discrete class. For example, we can predict if a customer would be creditworthy or not, or we can train a neural network to classify animals or plants on a picture. Thesemodels with discrete outcome have an equally large domain of use and most essential to any business application. The first one that we will discuss, the logistic regression, does only binary prediction but is very transparent and allows to build strong models.

22.1. Logistic Regression

Logistic regression (aka logit regression) is a regression model where the unknown variable is categorical (can have only a limited number of values): it can either be “0” or “1.” In reality if you can refer to anymutually exclusive concept such as: repay/default, pass/fail, win/lose, survive/die, or healthy/sick.

Cases where the dependent variable has more than two outcome categories may be analysed in multinomial logistic regression, or, if the multiple categories are ordered, in ordinal logistic regression. In the terminology of economics, logistic regression is an example of a qualitative response/discrete choices.

In its most general form, the logistic regression is defined as follows.

Get The Big R-Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.