In the Beginning

Decades ago, when asked what we thought was “beautiful evidence,” we would have laid out some combination of the following traits:

Elegance of studies

Many studies in software engineering include human factors, because the effectiveness of many software technologies depends heavily on the people who are using them.[1] But dealing with human variability is a very challenging task. Studies with sophisticated designs that minimize these confounding aspects can be a source of admiration for other researchers. For example, a study by Basili and Selby [Basili and Selby 1987] used a fractional factorial design, in which every developer used every technique under examination, and every technique was used on every program snippet in the experiment.

Statistical strength

As mathematical sophistication grows, so too does an emphasis on statistically significant results, so that researchers can be confident that their theory has some real-world effect that can be picked out from random background noise.

Replicability of results

Results are far more convincing when they’re found again and again in many different contexts—i.e., not limited to one context or set of experimental conditions. In other sciences, replication builds confidence, and for this reason much effort has been expended to make software engineering experiments easy to rerun by other researchers in other contexts [Basili et al. 1999]. As an example of replicability, Turhan showed that software defect predictors learned at other sites could be successfully applied at a new site [Turhan et al. 2009].



[1] After studying 161 projects, Boehm [Boehm et al. 2000] found that the best project personnel are 3.5 times more productive than the worst.

Get Making Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.