In our next example the response variable is a count of infected blood cells per mm2 on microscope slides prepared from randomly selected individuals. The explanatory variables are smoker (logical, yes or no), age (three levels, under 20, 21 to 59, 60 and over), sex (male or female) and body mass score (three levels, normal, overweight, obese).
count<-read.table("c:\\temp\\cells.txt",header=T) attach(count) names(count)  "cells" "smoker" "age" "sex" "weight"
It is always a good idea with count data to get a feel for the overall frequency distribution of counts using table:
table(cells) 0 1 2 3 4 5 6 7 314 75 50 32 18 13 7 2
Most subjects (314 of them) showed no damaged cells, and the maximum of 7 was observed in just two patients.
We begin data inspection by tabulating the main effect means:
tapply(cells,smoker,mean) FALSE TRUE 0.5478723 1.9111111 tapply(cells,weight,mean) normal obese over 0.5833333 1.2814371 0.9357143 tapply(cells,sex,mean) female male 0.6584507 1.2202643 tapply(cells,age,mean) mid old young 0.8676471 0.7835821 1.2710280
It looks as if smokers have a substantially higher mean count than non-smokers, that overweight and obese subjects had higher counts than normal weight, males had a higher count that females, and young subjects had a higher mean count than middle-aged or older people. We need to test whether any of these differences are significant and to assess whether there are interactions between the explanatory ...