Now that we understand the importance of the future, and that there are some things we don’t and can’t know about it, what can we know about it?
Well, one thing you can be sure of is that as time goes on, the environment around your software is going to change. Nothing stays the same forever. This means that your software will have to change in order to adapt to the environment around it.
This gives us the Law of Change:
The longer your program exists, the more probable it is that any piece of it will have to change.
As you go into an infinite future, you start tending toward a 100% probability that every single piece of your program will have to change. In the next five minutes, probably no part of your program will have to change. In the next 10 days, a small piece of it might. In the next 20 years, probably a majority of it (if not all of it) will have to change.
It’s hard to predict exactly what will change, and why. Maybe you wrote a program for 4-wheeled cars, but in the future everybody will drive 18-wheel trucks. Maybe you wrote a program for high school students, but high school education will get so bad that the students can’t understand it anymore.
The point is, you don’t have to try to predict what will change; you just need to know that things will change. Write your software so that it’s as flexible as reasonably possible, and you’ll be able to adapt to whatever future changes do come along.
Let’s look at some data on how a real-world program changed over time. There are hundreds of files in this particular program, but the details for each file won’t fit on this page, so four files have been chosen as examples. Details on these files are given in Table 4-1.
Table 4-1. Changes in files over time
5 years, 2 months
8 years, 3 months
13 years, 3 months
13 years, 4 months
In this table:
- Period analyzed
The time period over which the file existed.
- Lines originally
How many lines were in the file when it was originally written.
- Unchanged lines
How many lines are the same now as they were when the file was originally written.
- Lines now
How many lines there are in the file now, at the end of the analysis period.
- Grew by
The difference between “Lines now” and “Lines originally.”
- Times changed
The total number of times a programmer made some set of changes to the file (where one set of changes involves changes to many lines). Usually one set of changes will represent one bug fix, one new feature, etc.
- Lines added
How many times, over the history of the file, a new line was added.
- Lines deleted
How many times, over the history of the file, an existing line was deleted.
- Lines modified
How many times, over the history of the file, an existing line was changed (but not newly added or deleted).
- Total changes
The sum of the “Lines added,” “Lines deleted,” and “Lines modified” counts for that file.
- Change ratio
How much larger “Total changes” is than “Lines originally.”
When we refer to “lines” in the above descriptions, that includes every line in the files: code, comments, documentation, and empty lines. If you were to do the analysis without counting comments, documentation, and empty lines, one major difference you would see is that the “Unchanged lines” count would become much smaller in proportion to the other numbers. (In other words, the unchanged lines are nearly always comments, documentation, or empty lines.)
The most important thing to realize from this table is that a lot of change happens in a software project. It becomes more and more likely that any particular line of code will change as time goes on, but you can’t predict exactly what is going to change, when it’s going to change, or how much it will have to change. Each of these four files changed in very different ways (you can see this even just looking at the numbers), but they all changed a significant amount.
There are a few other interesting things about the numbers, as well:
Looking at the change ratio, we see that more work was put into changing each file than writing it originally. Obviously, line counts aren’t a perfect estimate of how much work was actually done, but they do give us a general idea. Sometimes the ratio is huge—for example, file 4 had 36 times as many total changes as it did original lines.
The number of unchanged lines in each file is small compared to its “Lines originally” count, and even smaller compared to its “Lines now” count.
A lot of change can happen to a file even if it only gets a little bit bigger over time. For example, file 3 grew by only 161 lines over 13 years, but during that time the total changes count reached 3,047 lines.
The total changes count is always larger than the lines now count. In other words, you’re more likely to have changed a line in a file than to have a line in a file, once the file has been around for long enough.
In file 3, the number of lines modified is larger than the number of lines in the original file plus the number of lines added. That file’s lines have been modified more often than new lines have been added. In other words, some lines of that file have changed over and over. This is common on projects with a long lifetime.
The above points aren’t all that could be learned here—there is a lot more interesting analysis that could be done on these numbers. You’re encouraged to dig into this data (or work out similar numbers for your own project) and see what else you can learn.
Another good learning experience is looking over the history of changes made to one particular file. If you have a record of every change made to files in your program, and you have one file that’s been around for a long time, try looking at each change made over its lifetime. Think about if you could have predicted that change when the file was originally written, and consider whether the file could have been better written originally to make the changes simpler. Generally, try to understand each change and see if you can learn anything new about software development from doing so.
There are three broad mistakes that software designers make when attempting to cope with the Law of Change, listed here in order of how common they are:
Writing code that isn’t needed
Not making the code easy to change
Being too generic
There is a popular rule in software design today called “You Ain’t Gonna Need It,” or YAGNI for short. Essentially, this rule states that you shouldn’t write code before you actually need it. It’s a good rule, but it’s misnamed. You actually might need the code in the future, but since you can’t predict the future you don’t know how the code needs to work yet. If you write it now, before you need it, you’re going to have to redesign it for your real needs once you actually start using it. So save yourself that redesign time, and simply wait until you need the code before you write it.
Another risk of writing code before you need it is that unused code tends to develop “bit rot.” Since the code never runs, it might slowly become out of sync with the rest of your system and thus develop bugs, and you’ll never know. Then, when you start to use it, you’ll have to spend time debugging it. Or, even worse, you might trust the never-before-used code and not check it, and it may cause bugs for users. In fact, the rule here should actually be expanded to read:
Don’t write code until you actually need it, and remove any code that isn’t being used.
That is, you should also get rid of any code that is no longer needed. You can always add it back later if it becomes needed again.
There are lots of reasons people think that they should write code before it’s needed, or keep around code that isn’t being used. First off, some people believe they can get around the Law of Change by programming every feature that any user could ever possibly need, right now. Then, they think, the program won’t have to be changed or improved in the future. But this is wrong. It’s not possible to write a system that will never change, as long as that system continues to have users.
Others believe that they are saving themselves time in the future by doing some extra work now. In some cases that philosophy works, but not when you’re writing code that isn’t needed. Even if that code ends up being needed in the future, you will almost certainly have to spend time redesigning it, so you’re actually wasting time.
One of the great killers of software projects is what we call “rigid design.” This when a programmer designs code in a way that is difficult to change. There are two ways to get a rigid design:
Make too many assumptions about the future.
Write code without enough design.
The rule used to avoid rigid design is:
Code should be designed based on what you know now, not on what you think will happen in the future.
Design based only on your immediate, known requirements, without excluding the possibility of future requirements. If you know for a fact that you need the system to do X, and just X, then just design it to do X, right now. It might do other things that aren’t X in the future, and you should keep that in mind, but for now the system should just do X.
When designing like this, it also helps to keep your individual changes small. When you only have to make a small change, it’s easy to do some real design on it.
This isn’t to say that planning is bad. A certain amount of planning is very valuable in software design. But even if you don’t write out detailed plans, you’ll be fine as long as your changes are always small and your code stays easily adaptable for the unknown future.
When faced with the fact that their code will change in the future, some developers attempt to solve the problem by designing a solution so generic that (they believe) it will accommodate every possible future situation. We call this “overengineering.”
The dictionary defines overengineering as a combination of “over” (meaning “too much”) and “engineer” (meaning “design and build”). So, per the dictionary, it means designing or building too much for your situation.
Wait—designing or building too much? What’s “too much”? Isn’t design a good thing?
Well, yes, most projects could use more design, as we saw in Example: Code Without Enough Design. But once in a while, somebody really gets into it and just goes overboard—sort of like building an orbital laser to destroy an anthill. An orbital laser is an amazing engineering feat, but it costs an enormous amount of money, takes far too long to build, and is a maintenance nightmare. Can you imagine having to go up there and fix it when it breaks?
There are several other problems with overengineering:
You can’t predict the future, so no matter how generic your solution is, it will not be generic enough to satisfy the actual future requirements you will have.
When your code is too generic, it often doesn’t handle specifics very well from the user’s perspective. For example, say you design some code that treats all input the same—it’s all just bytes. Sometimes this code processes text, and sometimes it processes pictures, but all it knows is that it’s getting bytes. In a way, this is a good design: the code is simple, self-contained, small, etc.
But then you make sure that no part of your code distinguishes between pictures and text. This is too generic. When the user passes in a bad picture, the error she gets is, “You passed in bad bytes.” It should have said, “You passed in a bad picture,” but your code is so generic that it can’t tell the user that. (There are lots of ways that generic code can fall short when put to specific uses; this is just an example.)
Being too generic involves writing a lot of code that isn’t needed, which brings us back to our first flaw.
In general, when your design makes things more complex instead of simplifying things, you’re overengineering. That orbital laser would hugely complicate the life of a person who just needed to destroy some anthills, whereas some simple ant poison would greatly simplify that person’s life by removing the ant problem (assuming it worked).
Being generic with the right things, in the right ways, can be the foundation of a successful software design. However, being too generic can be the cause of untold complexity, confusion, and maintenance effort. The rule for avoiding this flaw is similar to the rule for avoiding rigid designs:
Be only as generic as you know you need to be right now.
There is a method of software development that avoids the three flaws by its very nature, called “incremental development and design.” It involves designing and building a system piece by piece, in order.
It is easiest to explain by example. Here’s how we would use it to develop a calculator program that needs to add, subtract, multiply, and divide:
Plan a system that does only addition and nothing else.
Implement that system.
Fix up the now-existing system’s design so that it is easy to add a subtraction feature.
Implement the subtraction feature in the system. Now we have a system that does only addition and subtraction, and nothing else.
Fix up the system’s design again so that it is easy to add a multiplication feature.
Implement the multiplication feature in the system. Now we have a system that does addition, subtraction, multiplication, and nothing else.
Fix up the system’s design again so that it is easy to add the division feature. (At this point, this should take little or no effort, because we already improved the design before implementing subtraction and multiplication.)
Implement the division feature in the system. Now we have the system we started out intending to build, with an excellent design that suits it well.
This method of development requires less time and less thought than planning the entire system up front and building it all at once. It may not be easy at first if you are used to other development methods, but it will become easy with practice.
The tricky part of using this method is deciding on the order of implementation. In general, you should pick whatever is simplest to work on at each step, when you get there. We picked addition first because it was the simplest of all four operations overall, and subtraction second because it logically built on addition in a very simple way. We could possibly have picked multiplication second, since multiplication is just the action of doing addition many times. The only thing we would not have picked second is division, because stepping from addition to division is too far of a logical jump—it’s too complex. On the other hand, stepping from multiplication to division at the end was really very simple, so that was a good choice.
Sometimes you may even need to take a single feature and break it down into many small, simple, logical steps so that it can be implemented easily.
This is actually a combination of two methods: one called “incremental development” and another called “incremental design.” Incremental development is a method of building up a whole system by doing work in small pieces. In our list, each step that started with “Implement” was part of the incremental development process. Incremental design is similarly a method of creating and improving the system’s design in small increments. Each step that started with “Fix up the system’s design” or “Plan” was part of the incremental design process.
Incremental development and design is not the only valid method of software development, but it is one that definitely prevents the three flaws outlined in the previous section.
 This is the story of a file called process_bug.cgi from a product called “Bugzilla.” The story has been simplified somewhat from what actually happened, but the numbers (in terms of lines of code and the time it took to fix it) are roughly accurate. If you want to see the entire history of the redesign project to see how it was done, you can read the records listed here: https://bugzilla.mozilla.org/showdependencytree.cgi?id=367914&hide_resolved=0.