15Binary Response Variable

Many statistical problems involve binary response variables. For example, we often classify things as dead or alive, occupied or empty, healthy or diseased, male or female, literate or illiterate, mature or immature, solvent or insolvent, employed or unemployed, and it is interesting to understand the factors that are associated with an individual being in one class or the other. In a study of company insolvency, for instance, the data would consist of a list of measurements made on the insolvent companies (their age, size, turnover, location, management experience, workforce training, and so on) and a similar list for the solvent companies. The question then becomes which, if any, of the explanatory variables increase the probability of an individual company being insolvent.

The response variable contains only 0s or 1s; for example, 0 to represent dead individuals and 1 to represent live ones. Thus, there is only a single column of numbers for the response, in contrast to proportion data where two vectors (successes and failures) were bound together to form the response (see Chapter 14). An alternative is allowed by R in which the values of the response variable are represented by a two-level factor (like ‘dead’ or ‘alive’, ‘male’ or ‘female’, etc.).

The way that R treats binary data is to assume that the values of the response come from a binomial trial with sample size 1. If the probability that an individual is dead is p, then the probability of ...

Get Statistics: An Introduction Using R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.