Chapter 4. Control Code Versions and Development Branches

I was working on the proof of one of my poems all the morning, and took out a comma. In the afternoon I put it back again.

Oscar Wilde

Best Practice:

  • Use a standard version control system and keep track of development branches.

  • Integrate your code regularly and commit changes both regularly and specifically.

  • This improves the development process because developers can work in isolation without drifting apart.

In this chapter we apply the ideas of GQM to version control. Bugs and regressions (recurring defects) occur regularly during development and maintenance of any codebase. To solve those you will need to reanalyze code, and you may need to revert some code or configuration to earlier versions. Consider what happens if there is no version control and all developers can only work on one version. It will be hard to see who is changing what and hard to avoid that developers break each other’s code.

Version control is a solution to this. It does what it says: controlling versions, which allows you to divide work among developers. In its most basic form, a version control system tracks changes by labeling them with the person who made the change, and a timestamp of that change. Having the version history available allows you to revert to an earlier version when necessary.

To better understand what version control systems do, let us start at the beginning. There is one main version of the code that is leading for what is being put into production, referred as the trunk (main line). When developers make changes, they can commit changes and additions to the trunk. When multiple developers work on the trunk, they can get in each other’s way. To avoid this, variations can be developed in branches (meaning a new, parallel version of the software). Branches can later be merged into the trunk.

Therefore version control systems generally have integration functionality for merging different versions. In that way, developers can work on isolated parts of the source code. Usually, the merging of source code (which is text-based) proceeds automatically, but when there is a conflict, it needs to be resolved manually.

There can be a central repository that holds the trunk (a centralized version control system). In that case, developers make a copy of that trunk to work on and commit to the central version control system. With distributed version control, each developer has a local repository, and changes can be shared among each other to create a version (pushing their own changes and pulling changes of others).

There are several version control systems available, such as Git, Subversion, and Mercurial, each with a slightly different vocabulary and different mechanisms. A notable difference is that Git and Mercurial allow developers to commit changes into local (individual) branches on their own machines, which can later be merged with the main repository. Subversion prefers to always commit changes directly to the central repository.

Although version control has the most visible benefits for source code, you should put all versionable parts of the software in version control, including test cases, database configuration, deployment scripts, and the like. That way, you can re-create the different testing and production environments more easily. Documentation can be under version control, and sometimes external libraries (when you cannot rely only on the versioning of the library provider).

Important

Do not put generated code into version control. You do not need to, because the build system will generate that code for you. In case that generated code is actually maintained, it should be versioned and be part of version control. But you should refrain from adjusting generated code as it will lead to problems when you need to re-generate it.

There are situations in which libraries need to be included in version control (e.g., for audits to ensure that known-working versions are used). A more thorough discussion on using third-party code appears in Chapter 10.

Motivation

Different version control systems share common advantages: they allow the development team to track changes over time, they use branching to allow developers to work independently on variations of the same codebase, and they merge files automatically when versions somehow conflict.

Tracking Changes

Tracking changes with a version control system has the advantage of going back in time. When things go right, there may not be a reason to do so. However, when functionality breaks, comparing old versions is a good tactic to find the cause of the issue (instead of reanalyzing the whole code). So the developer can revert the source code version (in its working copy) to the moment when the bug was introduced. This comparison helps with fixing the bug or simply replacing the new version with the old one. Then, a new (local) version can be merged back into the trunk again.

Version Control Allows Independent Modification

Independent modification in part of the source code avoids conflicts that happen when different pieces are adjusted at the same time. So when two developers need to implement different functionalities but need the common codebase for testing, for example, they can create two isolated branches of the same codebase that they can modify independently. In that way, the code of the one developer will not interfere with the work of the other. When one of the branches functions satisfactorily, it can be pushed back and merged into the main development line. Then the changed functionality becomes available to the whole team.

Version Control Allows Automatic Merging of Versions

Every time developers need to exchange their work, source files need to be merged. Merging source code versions manually is notoriously difficult if they are very different from each other. Luckily, version control can do most of the merging automatically by analyzing the difference between a modified file and the original. It is only when there is no definite way to combine the two files that a problem (merge conflict) occurs. Merge conflicts need to be dealt with manually because they indicate that two persons made different changes to the same line in a file, and you still need to agree on the modification that is most suitable.

How to Apply the Best Practice

Consider that changes and additions are much easier to manage when done in small iterations. Each iteration moves the software forward with a small step. Difficulties arise when you want to make large and ambitious changes infrequently.

Therefore, there are two main principles you should adhere to:

  • Commit specifically and regularly

  • Integrate your code regularly, by building the full product from source code and testing it

We will show how you can use different metrics of the version control system to measure the quality of your development process.

Commit Specifically and Regularly

In many cases, team members will not be aware of what other developers are working on, so it is important to register this when you make changes in version control. So, every commit should be specific to one piece of functionality or, when you have an issue tracking system in place, it should be linked to precisely one issue. Commits also should be done on a regular basis: this helps to keep track of the flow of work and of the progress toward the team goal. Moreover, you will avoid merge conflicts by committing your changes to the version control server regularly.

Important

Note that “keeping changes small” implies that you also divide development work into small parts.

A typical indicator for this principle is the commit frequency of team members, and to link the commit messages with the issue tracker. You could further measure the build success rate to determine how quickly issues are solved, and in how many cases the issues need extra work. This could serve as an alternative measure of team velocity.

Integrate Your Code Regularly

Besides committing new code on a regular basis, it is vital to integrate code between different branches of the codebase as often as possible, preferably daily. This is because the later you integrate your code, the more likely it is that there will be merge conflicts or bugs introduced.

There should be a proper balance between the time a programmer operates independently on a branch and the number of merge conflicts. An indication of this is (average) branch lifespan or the relation between branch lifespan and the number of related merge conflicts.

Having long-lived branches (that exist for, say, more than two sprints) poses problems. In general, it makes truly isolated maintenance more difficult, and in some cases unlikely. Long-lived branches heighten the risk of merge conflicts, because the longer they live, the more they tend to divert from the trunk. When branches start to evolve into notably different functionality, they may even become impossible to merge and become forks (independent versions of the software). It is easy to imagine the panic if one fast-paced developer that works independently tries to merge a month of work one day before the release deadline.

Controlling Versions in Practice

Keeping in mind that effectively managing code variations requires avoiding long-lived branches, you also wish to confirm whether the team works more productively with the help of version control. Therefore, you would expect that issue resolution time should not increase when applying version control. Assuming that the goals are formulated from the viewpoint of the team lead, you can come up with the following GQM model:

  • Goal A: To manage code variations by preventing long-lived development branches.

    • Question 1: How much time passes between commits?

      • Metric 1a: Number of developers that did not commit the last 24 hours. This is a simple metric that serves as an indicator of whether long-term branches are developing. Of course, developers could be working on other things, out of office, not committing code. Those explanations are easy. The not-so-obvious cases are the interesting ones. Expect this metric to be fairly stable over time, yet it will almost never be zero.

      • Metric 1b: Average branch lifespan. This metric attempts to measure that branch lifespan is limited to avoid the risks of long-lived branches. Expect a downward trend toward the period of time that it takes to implement one work package (task, bug, user story, etc.). In most cases this means resolving the full work package and ideally that work package should be sized in a way that it can be fixed in a day. Therefore it is also dependent on how well issues are registered.

      • Metric 1c: Commit frequency. The same reasoning applies here. A high commit frequency signals that work packages are small enough to be done in a short period and that branches are not open indefinitely. Expect a downward trend toward daily commits. Note that in some situations, this metric may be inaccurate due to, for example, bug fixes that require the use of specialized frequent commits, such as in high-security and mission-critical environments.

  • Goal B: To understand the causes influencing team productivity (by analyzing issue resolution time).

    • Question 2: What is the typical issue resolution time for the system?

      • Metric 2a: Average issue resolution time, calculated as the total issue resolution time divided by number of issues. Expect a downward trend toward a relatively stable minimum amount of effort where the team can resolve issues efficiently.

      • Metric 2b: Percentage of issues that are both raised and solved within the same sprint. A high percentage could signify that bugs are processed immediately instead of postponed, which is desired behavior. Expect an upward trend.

      • Metric 2c: Average issue resolution time between different levels of urgency. The distinction is relevant because different levels of urgency may explain different resolution times. Expect a downward trend for each category.

    • Question 3: Are enough issues resolved within a reasonable amount of time? In this case, assume 24 hours is quick and 72 hours is acceptable.

      • Metric 3a: Percentage of issues resolved within 24 hours. Expect an upward trend until a stable rate has been achieved.

      • Metric 3b: Percentage of issues resolved within 72 hours. Expect an upward trend until a stable rate has been achieved.

Although issue resolution time is a common metric, it is important to realize its limitations. The metric could be distorted when issues are being defined in ever-smaller work packages (which makes it seem that efficiency rises), or if team members are closing unsolved issues instead of resolving them to clear the backlog. The latter may be a case of treating the metric. This is particularly a problem if there is incentive to make the metric more favorable. This would be especially at risk when team members’ performance is evaluated with (mainly) this metric. Therefore make sure that you are not using only one metric (the pitfall one track metric), especially when that metric is important in evaluating performance. That can lead to the pitfall treating the metric, which causes unintended behavior.

Also note that productivity is not straightforward to define. It definitely is not only about lines of code written. Quality of the written code is a more serious consideration. For considerations on code quality, refer to “Related Books”.

  • Goal B (continued)

    • Question 4: How productive are we in terms of functionality implemented?

      • Metric 4: Velocity/productivity in terms of story points. Compare the average per sprint with the velocity measured over the system as a whole. Expect an upward trend initially as the team gets up to speed. Expect the trend to move toward a stable rate of “burning” story points each sprint (assuming similar conditions such as team composition). If version control is not present, we expect productivity to be lower, as it causes more overhead to check and merge code adjustments. The story point metric clearly assumes that story points are estimated consistently by the team. To use this metric we need to assume that on average a story point reflects approximately the same amount of effort.

Note

Note that the initial measurements can serve as a baseline for later comparison. Often, the trend is more meaningful than the (initial) values.

Common Objections to Version Control Metrics

When you have a version control system in place and you adhere to the two most important conventions of version control, you are in a better position to measure the effectiveness of your development process. Common objections to the best practice in this chapter is that using different version control systems inhibits analysis or that specific measurements are unfeasible because commits are hard to trace to issues.

Objection: We Use Different Version Control Systems

“We cannot meaningfully measure branch lifespan because one part of the team likes Git while another prefers Subversion.”

There seems to be a problem underlying this that is more important than the measurement. Measuring the effectiveness of your version control is unfeasible if you use version control in an inconsistent manner. Using different version control systems increases complexity for merging and partially offsets its advantages. As a team you will need to make a choice for one over the other version control system in order to achieve consistency.

Objection: Measuring the Recommendations Is Unfeasible (for Example, Whether Commits Are Specific)

“We cannot determine in all cases where specifically commits refer to. We would need to read through all commit messages.”

Consider that it is a matter of discipline whether commit messages are specific. In order to find out whether commit messages are specific, you could sample commit messages or ask developers directly. Many version control systems help you to be specific in pushing changes by requiring a commit message, but this can also be enforced technically. This could be done by adding a pre-commit check in the version control system that checks whether the commit message contains a valid ticket identifier.

Metrics Overview

As a recap, Table 4-1 shows an overview of the metrics discussed in this chapter, with their corresponding goals.

Table 4-1. Summary of metrics and goals in this chapter
Metric # in text Metric description Corresponding goal

VC 1a

Number of developers that did not commit the last
24 hours

Preventing long-lived development branches

VC 1b

Average branch lifespan

Preventing long-lived development branches

VC 1c

Commit frequency

Preventing long-lived development branches

VC 2a

Average issue resolution time

Team productivity

VC 2b

Percentage of issues raised and solved within sprint

Team productivity

VC 2c

Average issue resolution time for each level of urgency

Team productivity

VC 3a

Percentage of issues resolved within 1 day

Team productivity

VC 3b

Percentage of issues resolved within 3 days

Team productivity

VC 4

Velocity in terms of story point burn rate

Team productivity

See also Chapter 5 on controlling different environments: development, test, acceptance, and production. This avoids surprises such as failed tests due to unequal testing and deployment environments.

Get Building Software Teams now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.