Many studies in software engineering include human factors, because the effectiveness of many software technologies depends heavily on the people who are using them. But dealing with human variability is a very challenging task. Studies with sophisticated designs that minimize these confounding aspects can be a source of admiration for other researchers. For example, a study by Basili and Selby [Basili and Selby 1987] used a fractional factorial design, in which every developer used every technique under examination, and every technique was used on every program snippet in the experiment.
As mathematical sophistication grows, so too does an emphasis on statistically significant results, so that researchers can be confident that their theory has some real-world effect that can be picked out from random background noise.
Results are far more convincing when they’re found again and again in many different contexts—i.e., not limited to one context or set of experimental conditions. In other sciences, replication builds confidence, and for this reason much effort has been expended to make software engineering experiments easy to rerun by other researchers in other contexts [Basili et al. 1999]. As an example of replicability, Turhan showed that software defect predictors learned at other sites could be successfully applied at a new site [Turhan et al. 2009].