Statistical analyses are subject to threats to the validity of the conclusions extracted from the data. Returning to the exit poll example given in the introduction to this chapter, the socioeconomic profile of the interviewees might bias the results of the exit poll. A robust exit poll requires a random sample to be selected; otherwise, the poll will not accurately predict the actual results of the elections. Along the same lines, the results shown in this chapter might be biased because of some threats to validity, and the conclusions might not hold for other software projects. For the sake of the completeness of our analysis, we discuss the threats to validity:
The first problem you might have spotted is regarding the level of significance of the correlations shown in the previous sections. Because of the statistical properties and the size of the samples (in the range of hundred of thousands), this level of significance will always be very high (in the order of 99.99%).
From a software development point of view, this study should be extended to other programming languages. The conclusions shown here for C might not hold for other languages. Although C is currently the most popular language in the open source community (in terms of available code), other languages are also very popular, growing much faster than C, and have vast amounts of code available.
Finally, all the source code used for this study was released as open source. Although ...