Measuring the Source Code

We have selected for our case study the ArchLinux software distribution (http://archlinux.org), which contains thousands of packages, all open source. ArchLinux is a lightweight GNU/Linux distribution whose maintainers refuse to modify the source code packaged for the distribution, in order to meet the goal of drastically reducing the time that elapses between the official release of a package and its integration into the distribution. There are two ways to install a package in ArchLinux: using the official precompiled packages, or installing from source code using the Arch Build System (ABS).

ABS makes it possible to retrieve the original, pristine source code of all the packages. This is different from other distributions, which make copies of the source code of the packages and often patch it to adapt it to the rest of the distribution. With ABS, we can gather source code from its original location, at the upstream projects’ websites and repositories, in an automatic fashion. This ensures that the source code has not been modified, and therefore that the case studies in our sample are independent. As we will show later in the results section, this property of independence is crucial for the validity of the results.

Because of the size of ArchLinux, using it as a case study gives us access to the original source code of thousands of open source projects, through the build scripts used by ABS (see Example 8-1).

Example 8-1. Header of a sample build script ...

Get Making Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.