Investigating the Practice of Code Cloning

The unspoken and largely unchallenged presumption among software engineers is that code cloning is a bad thing. Always. OK, maybe you can cheat a little in the short term, but in the long term, it’s a bad idea. In fact, Kent Beck says precisely this in his chapter on “code smells” in Fowler’s Refactoring:

Number one in the stink parade is duplicated code. If you see the same code structure in more than one place, you can be sure that your program will be better if you find a way to unify them.

Our experience and intuition said this was too simplistic a view. For example, our colleague Jim Cordy reminded us that the engineering view of using existing solutions was pretty different: in languages such as FORTRAN and COBOL, where the syntax is awkward and the ability to form high-level abstractions is limited, existing solutions are often treated as tools to be reused and adapted for new situations. (This sounds like what a library is for, but often these kinds of solutions can’t be packaged up so neatly to make a library.)

So we decided to go in a different direction: what are the characteristics of code cloning in industrial software systems? What patterns exist? Can we identify them using static analysis? Can we make judgment calls about when cloning might be a reasonable, and even advantageous, design decision? Armed with knowledge and empirical studies, can we use code duplication as a principled engineering tool?

We set out to create a catalog ...

Get Making Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.