Chapter 16

Proportion Data

An important class of problems involves count data on proportions such as:

  • studies on death rates,
  • infection rates of diseases,
  • answers to questionnaires,
  • proportion responding to clinical treatment,
  • proportion admitting to particular voting intentions,
  • sex ratios, or
  • data on proportional response to an experimental treatment.

What all these have in common is that we know how many of the experimental objects are in one category (dead, insolvent, male or infected) and we also know how many are in another (alive, solvent, female or uninfected). This contrasts with Poisson count data, where we knew how many times an event occurred, but not how many times it did not occur (p. 579).

We model processes involving proportional response variables in R by specifying a generalized linear model with family=binomial. The only complication is that whereas with Poisson errors we could simply specify family=poisson, with binomial errors we must give the number of failures as well as the numbers of successes in a two-vector response variable. To do this we bind together two vectors using cbind into a single object, y, comprising the numbers of successes and the number of failures. The binomial denominator, n, is the total sample, and

number.of.failures <- binomial.denominator - number.of.successes
y <- cbind(number.of.successes, number.of.failures)

The old fashioned way of modelling this sort of data was to use the percentage mortality as the response variable. There ...

Get The R Book, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.