The first opportunity arose after I had collected a number of implementations of the same program for a different study. That study [Prechelt and Unger 2001] measured the effects of receiving a training in Watts Humphrey’s Personal Software Process (PSP, [Humphrey 1995]). The PSP claims big improvements in estimation accuracy, defect density, and productivity, yet our study found the effect to be a lot smaller than expected.
Half of the graduate student subjects in that study had received PSP training, and the others had received other programming-related training. In the study, they all solved exactly the same task (described in the next section), but could choose their programming language freely. I ended up with 40 implementations (24 in Java, 11 in C++, and 5 in C), and it occurred to me that it would be quite interesting to compare not just the PSP-trained programmers against the non-PSP-trained programmers, but to treat the programs as three sets stemming from different languages. It would be even more interesting to have further implementations in several scripting languages to compare.
I posted a public “call for implementations” in several Usenet newsgroups (this was in 1999), and within four weeks received another 40 implementations from volunteers: 13 in Perl, 13 in Python, 4 in Rexx, and 10 in TCL.
At this point, your skeptical brain ought to yell, “Wait! How can he know these people are of comparable competence?” A ...