During software development, programmers routinely produce and collect lots of data, all of which can be accessed and analyzed automatically:
The source code for your product. This is the most important input to your analysis, as it provides you with locations (files, units, classes, components, etc.) that can be associated with various product or process factors.
Collecting data on the execution of the software provides you with profiles, telling you which parts are frequently used and which parts are not.
Your product may come with additional documentation, such as design documents or requirements documents; these may also provide important features that explain why code looks the way it does.
The resulting software can be analyzed statically, providing features such as complexity metrics or dependencies.
Version archives record the changes made to the product, including who, when, where, and why. Version archives can tell a lot about a project’s history, if the stored changes are all logically separated and if the stored rationales are used in a systematic and consistent manner.
To map problems to locations, it is important to have a problem database that describes all the problems that ever occurred and tracks their life cycles.
Finally, you may have social data: a partitioning of developers into projects or groups, emails or other messages between developers, and even billing or effort data. With such data, you can, for instance, determine how effort maps to ...