## Testing for Lack of Fit in a Regression with Replicated Data at Each Level of *x*

The unreliability estimates of the parameters explained in Boxes 10.2 and 10.3 draw attention to the important issues in optimizing the efficiency of regression designs. We want to make the error variance as small as possible (as always), but in addition, we want to make *SSX* as large as possible, by placing as many points as possible at the extreme ends of the *x* axis. Efficient regression designs allow for:

- replication of least some of the levels of
*x*;
- a preponderance of replicates at the extremes (to maximize
*SSX*);
- sufficient different values of
*x* to allow accurate location of thresholds.

Here is an example where replication allows estimation of pure sampling error, and this in turn allows a test of the significance of the data's departure from linearity. As the concentration of an inhibitor is increased, the reaction rate declines:

data<-read.delim("c:\\temp\\lackoffit.txt")
attach(data)
names(data)
[1] "conc" "rate"
plot(conc,rate,pch=16,ylim=c(0,8))
abline(lm(rate~conc))

The linear regression does not look too bad, and the slope is highly significantly different from zero:

model.reg<-lm(rate~conc)
summary.aov(model.reg)
Df Sum Sq Mean Sq F value Pr(>F)
conc 1 74.298 74.298 55.333 4.853e-07 ***
Residuals 19 25.512 1.343

Because there is replication at each level of *x* we can do something extra, compared with a typical regression analysis. We can estimate what is called the **pure error variance ...**