Large enterprise systems typically have a long lifespan, are built by many programmers, and are developed and released to the field at regular intervals. Anecdotal reports have stated that after a large system has gone through a few early releases, bugs in later versions tend to be concentrated in relatively small sections of the code. The bug distribution is frequently described as a Pareto distribution, with 80% of the bugs being located in just 20% of the code entities such as files or methods. This situation can be very helpful for system testers and debuggers if they can identify just which files fall into the 20% that contain problems because it would allow them to focus their Quality Assurance efforts such as testing, inspections, and debugging most effectively.
Working at AT&T, we have access to quite a few large systems with extended lifetimes, so we started a project to:
Identify the parts of the code most likely to have bugs prior to the system testing phase
Design and implement a programming environment tool that identifies the most bug-prone parts of the system and presents the information to developers and testers
The concept of the “most bug-prone parts of the system” is meaningful only if faults really are highly concentrated in certain parts of the system, so in order for these to be feasible goals, we first need to provide evidence that bugs are indeed distributed throughout the code with a highly skewed Pareto-like distribution.
We have examined ...