Code Churn

Software systems evolve over time due to changes in requirements, optimization of code, fixes for security and reliability bugs, etc. Code churn measures the changes made to a component over a period of time and quantifies the extent of this change. It is easily extracted from a system’s change history, as recorded automatically by a version control system. Most version control systems use a file comparison utility (such as diff) to automatically estimate how many lines were added, deleted, and changed by a programmer to create a new version of a file from an old version. These differences are the basis of churn measures.

Relative churn measures are normalized values of the various churn measures. Some of the normalization parameters are total lines of code, file churn, file count, etc. In an evolving system it is highly beneficial to use a relative approach to quantify the change in a system. As we show, these relative measures can be devised to cross-check each other so that the metrics do not provide conflicting information. Our basic hypothesis is that code that changes many times pre-release will likely have more post-release defects than code that changes less over the same period of time.

In our analysis, we used the code churn between the release of Windows Server 2003 (W2k3) and the release of the W2k3 Service Pack 1 (W2k3-SP1) to predict the defect density in W2k3-SP1. Using the directly collected churn metrics such as added, modified, and deleted lines of code ...

Get Making Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.