Chapter 21

Logistic Regression

Stanley Lemeshow and David W. Hosmer

21.1 Introduction

The goal of a logistic regression analysis is to find the best-fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent or response variable) and a set of independent (predictor or explanatory) variables. What distinguishes the logistic regression model from the linear regression model is that the outcome variable in logistic regression is categorical and most usually binary or dichotomous.

In any regression problem the key quantity is the mean value of the outcome variable, given the value of the independent variable. This quantity is called the conditional mean and will be expressed as E(Y|x), where Y denotes the outcome variable and x denotes a value of the independent variable. In linear regression we assume that this mean may be expressed as an equation linear in x (or some transformation of x or Y), such as

equation

This expression implies that it is possible for E(Y|x) to take on any value as x ranges between −∞ and +∞.

Many distribution functions have been proposed for use in the analysis of a dichotomous outcome variable. Cox and Snell [2] discuss some of these. There are two primary reasons for choosing the logistic distribution: (i) From a mathematical point of view it is an extremely flexible and easily used function, and (ii) ...

Get Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.