The goal of a logistic regression analysis is to find the best-fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent or response variable) and a set of independent (predictor or explanatory) variables. What distinguishes the logistic regression model from the **linear regression** model is that the outcome variable in logistic regression is categorical and most usually *binary* or *dichotomous.*

In any regression problem the key quantity is the mean value of the outcome variable, given the value of the independent variable. This quantity is called the *conditional mean* and will be expressed as *E*(*Y*|*x*), where *Y* denotes the outcome variable and *x* denotes a value of the independent variable. In linear regression we assume that this mean may be expressed as an equation linear in *x* (or some transformation of *x* or *Y*), such as

This expression implies that it is possible for *E*(*Y*|*x*) to take on any value as *x* ranges between −∞ and +∞.

Many distribution functions have been proposed for use in the analysis of a dichotomous outcome variable. Cox and Snell [2] discuss some of these. There are two primary reasons for choosing the logistic distribution: (i) From a mathematical point of view it is an extremely flexible and easily used function, and (ii) ...

Start Free Trial

No credit card required