In the process of developing our fault prediction technology and tool, we did a large series of empirical studies. Because it is fairly unusual to see repeated large industrial empirical studies that follow multiple systems for multiple years, it would perhaps be interesting for readers to hear which parts of the work are most challenging and why, so they can understand why this sort of evidence is so critical even though it is difficult to provide.
Many people seem to feel that since we work for a large company that has hundreds of millions or perhaps billions of lines of code that run continuously, obtaining systems to study would be trivial. Nothing could be further from the truth, especially at early stages of the research. It’s sort of a chicken and egg syndrome. System owners are reluctant to allow you access to their systems for a number of reasons, most of which are perfectly sensible from their points of view.
The first thing they worry about is that you will take time, and they typically have very tight deadlines. If they spend time answering your questions, they will see no direct benefit and will have less chance of deploying their system on schedule. Therefore they are unwilling to get involved with what they perceive to be high-risk research projects. Most research projects are in fact high-risk because they never get to a stage that can be used by practitioners.
System owners may also fear that you will modify the system ...