In his CM: the Next Generation series, Joe Farah gives us a glimpse into the trends that CM experts will need to tackle and master based upon industry trends and future technology challenges.
When it comes to CM plans, each project has to clearly identify its own strategies for branching, and subsequently for labeling and merging. There are many different practices for branching, and many different philosophies. Each CM group will vigorously defend its practices and document them in a "branching strategy" document for use by the development team.
In this article we explore a potential standard for next generation branching, one which requires advanced capabilities and processes, but one which can significantly reduce CM complexity and increase CM automation. As well as using it myself, and with a number of clients, I've seen many other companies migrate in the direction of stream-based branching, in part or in whole, with a goal toward clearer, less complex, more automated CM.
Branching should always arise from one common theme: the need to support the existing codeline. From the ground level, this is not always an easy call. For example, if a build is sent to the verification team, and I'm starting on a new feature, do I have to create a branch so that I can continue to support the code that went ot the verification team? What if I'm changing an API? Do I need to create a new branch so that it doesn't affect other existing code? You may have very definite answers to these questions, but the simple answer to both of them is: It depends. It depends on whether or not you have to support the previous code line. Wait a minute. Didn't I circularize my argument?
Let's take an example and say that I change an API. If it's not yet being used by anyone, there’s not a problem. If it is being used, but the changes are all upward-compatible and not requiring other code to change, again, there’s not a problem. If the changes do require other code to change, but I package the changes to all the other code as part of the same change package, again no problem. If I'm just making a small innocent change which is totally upward compatible, requiring no other code changes, but the release baseline is totally frozen in preparation for launch day, then, no way! We don't want any changes that could possibly affect the release schedule - and simple software changes that go wrong have a certain track record. So, it depends.
Branching is always a gut-wrenching issue. If I branch, I have a new branch to support, to label and, possibly, to merge. If problems come up that need to be addressed while the branch is in effect, I have another place where it has to be fixed. How do I decide when the branch will merge back? What happens if I determine a new branch has to be created off the old branch for another reason? How will I eventually reconcile them? What about a branch off the new branch? Now let's look at the new release development? Is the same thing happening there? I'm responsible for 100 files, how do I keep my sanity through all of this? Or what if I change jobs? How does the next person deal with it all? This is why we have branching strategies.
A branching strategy is far from a branching standard. Give me two different developers, or a developer and a CM manager, and I'll show you two different points of view on branching. Why are there so many different branching practices? Is it possible to focus on one or two practices that will do the job in a large number of cases?
Let's start out by looking at branching from above the forest. It could be very simple: branch into a new release for changes that are not to be included in prior releases, and otherwise don't branch. That covers the parallel release management view. But what about all of the other reasons we branch?
I've said it before, but we really need to look at it closely this time: The use of the branching capability is terribly overloaded. And it is terribly overloaded because our processes and tools don't address the real set of CM problems head on. We're going to explore this in more detail, but first, let's do some "dreaming".
A Simple Stream Branching Recommendation
Let's say branching was done only to support parallel release management, at most one branch on each file per release. Too simplistic? Perhaps, but I don't think so. In fact, for the last 30+ years I've said, "I don't think so". On 2 person projects, 20 person projects, 200 person projects, 2000 person projects - this is a concept that needs to be embraced. You might not be convinced, though, so tell me the problems with it after you've read the rest of this article. I'm sure we'll generate some intense discussions.
To start, assume branching were done only to support parallel release management. Well then, here are the recommended rules for the branching strategy from this perspective.
- Products, directories, files may all have branches
- Product branching follows the product road map of releases (i.e., product branches, a.k.a. release streams)
- Everything else follows a subset of the product branching layout
- Branching is never done to support a variant
- Change requests (ECRs, problem reports, tasks/activities, requirements, etc.) are targeted to a specific release stream.
- A change package, or update, as we will refer to it for clarity, is authorized by the approval of a change request.
- Each update is targeted to a specific product release (i.e., the stream of the update)
- Files are checked out against the update which collects all of the changed/changing files together.
- When a file is checked out, if there is no file branch for the update stream, one is created (and labeled with the stream)
- If there is a file branch for the update stream, a new revision is created along that branch.
So, whether or not you agree that this is a feasible approach, we now have a branching process that has several very interesting properties not always seen in a project.
- The branching strategy is easy to learn. In fact, the CM tool set can automate it.
- There is a simple naming/labeling convention for any branch: its stream name. No manual labeling should be needed.
- All files have the same branching layout, which is a subset of the product road map of releases.
- Technology transfer of the project does not involve a long analysis of the state of current branches and resolution of them.
Now let's go one step further. If I want to work on a particular release of a given product, all I should need to specify is the product and the stream (and perhaps a promotion level, but we'll get to that later). The CM tool should be able to infer everything I need to know from that simple context specification: product + stream [+promotion level].
If I need to know what's on my to-do list, the tool should present me with the list of items based on my context. If I need to check out a file for which there is no branch in the context stream, the tool should be able to look at the product road map and determine which stream to pull (and in this case branch) the source code from. If I'm just retrieving (as opposed to "checking out") a file, the same rules apply. The CM tool knows what the branches of each file mean because it has access to the product road map. I don't have to write a context view specification or deal with explicit file revision numbers.
What if I need to merge a change from one stream to another (i.e., change propagation)? Again, the CM tool knows the product history, and the set of files involved and their histories. It should be able to calculate which files of the change need merging, what the ancestors are, and walk me through (in case there are any conflicts) bringing me to a point for re-testing.
I don't have to worry about switching the main trunk from one release to another, because each release has a main trunk which continues forever. I don't need to set the timing, or make sure everyone has their merges done for the switchover. I don't have to worry about what the process is for a change that is needed in a release before or after it occupies the main trunk - it's the same for all releases for all time.
All I Really Need
So what do I need to know how to do in this scenario? As a developer, I need to know how to:
- Specify my context (although hopefully it's remembered until I next need to change it)
- Find my to-do list and start my changes (i.e., create my updates) based on an item in it. The CM tool should automatically restrict my to-do list to my context, and tag the update with my context, and reason for change
- Go to the source tree and check-out a file, or perhaps inspect it to see if I need to check it out. The CM tool should branch it for me (possibly with a confirmation option) if necessary, and it should attach it to my update
- Review (i.e., via delta report) and check-in my updates
- Promote my update (i.e., after it's checked in tell the build team when it's OK for them to take it)
- Occasionally select an update from a different release stream and propagate it to my current stream
I don't need to know:
- What the branching strategy is for the project and how to decide whether or not to branch a file
- How to branch and how to label a branch
- How to merge branches, and what to do with them after they are merged
- How to write a context view specification so that I pick up the right file versions
- How to link my reasons for a change to the modified file revisions
Hopefully you'll conclude that the “need to” items are a lot easier than the "don't need to" items. Perhaps I've left out a few from each list.
Back to Reality
This is all very well and good, but if the branching model is too simplistic, how can I do all of the things I really need to do? Are we not simply going to suffer elsewhere? The answer is yes. You are going to suffer tremendously if you don't have the process and tools in place to help you realize this model. Otherwise: No. So let's look at this in more detail.
Why do you create branches today? Before tossing all of this out as too simplistic, let's do the analysis. Why do you branch? First off, to support parallel releases. We've already covered that one and that's a good reason. But there are plenty of other reasons as well:
- To support parallel releases
- Because the main trunk is the wrong release
- To have a place to promote file revisions through the workflow: Ready for build, integrated, tested, production
- So that files can be checked in to the repository without breaking the build
- To support parallel checkouts of the same file so that work isn't held up
- So that files for a particular update can have their branches commonly labeled to collect them together
- To collect and label items for releases, baselines, builds or other such purposes.
- To identify, easily, the work done by a particular developer
- To allow back-ups of my local work before it's ready to be "checked in" to the main branch
- To support product variants
We need to do all these things or at least most of them: work on a different release than the "current" main trunk, have promotion levels, perform parallel checkouts, collect files into changes, releases, builds, baselines, etc., have our local work backed up, prevent pre-mature changes from breaking the build, identify our work, and so forth. These are all things we need to do.
Branching is a very powerful concept, and a capability provided by virtually all CM tools. In fact, if branching isn't supported, it's not really a CM tool. The question is: Why use branching to do all these things we need to do? The simple answer is that branching is powerful and, along with labeling, it lets me do all these things. The real answer is that your current processes and tools are insufficient to let you do these things without branching. Without branching and labeling, they just don't let you do what you need to do.
I'm all in favor of branching to support parallel releases. But, quite frankly, in over a quarter century of supporting dozens of companies in their CM practices, I haven't really seen another need for branching. What I've seen is the need to do all these things - but not by branching. Instead the use of good process and good tools will suffice while reducing complexity dramatically.
To Branch or Not To Branch
Before going on, I want to refer you back to our utopia, that is, our recommendation - where we didn't need all sorts of branching and the CM tools could help automate and simplify life for the developer, not to mention the CM manager and others. Oh, and let's not forget the technology transfer nightmare that can be avoided by not having 10,000 files, all with different spaghetti branching structures. No fights about who's doing the merging or when the main trunk is going to switch releases. This is just the motivation paragraph to get you to read on. If you've seen other motivating factors, great. Tell us about them, but let's read on.
We're going to evaluate each of these items and determine how branching is, in most cases, easily avoided. The goal here is not to do a lot of extra work and such to eliminate the need to branch. The goal instead is to eliminate all the work that is necessary because of branching, while just doing what has to be done anyway. If you want your CM to move fully into the next generation, this is what you'll have to wrestle with.
Let's look at each branching requirement.
1. Support of Parallel Releases: This is a real and valid use of branching. Just make sure the branches follow the product road map (which may change from time to time). Better yet, let's get the CM tools that make sure they do.
2. Main Trunk is the Wrong Release: If you're working on a change that's not for the main trunk, the tendency is to create a branch for that change that can be merged into the main trunk when it switches to the right release (at least when it's future work). Instead, since we have a branch per stream capability, you simply make the change in the stream for the release on which you're working. In fact, you no longer need a "main" trunk. Instead, you set your context to the stream you're working on, whether it's a support stream, future development or the current development stream.
3. Work Flow Promotion: Ready for build, integrated, tested, production: Most shops have some sort of work flow promotion for their updates (or in many cases files). It may be as simple as: in development, integration, production, or it may be more elaborate.
First of all, work flow promotion belongs on the update (i.e., the change), not on a file. To do otherwise is to risk getting partial changes promoted. CM tools or processes that do not support some form of change packaging are going to cause problems.
The next thing to notice is that many processes are geared to allow any update to be promoted in any order. This allows maximum flexibility. Well, not quite. This pessimistic view (i.e., that any update can be promoted in any order relative to other updates) requires a process that has separate branches for each promotion level and requires merging at each promotion step. Who is responsible for the merging? When does it occur? What about the separate post-merge testing? What happens if the process needs a new promotion level inserted?
In the optimistic view we realize that updates are normally (in fact almost always), checked in to the repository in the same order that they will be promoted. We're not talking about a strict update-by-update ordering here, but rather an ad-hoc grouping of updates promoted in the same order that the group was checked in, as promotion beyond the developer tends to occur in batches. For example, each night, all checked in updates might be selected for the build. At first glance this appears to reduce the flexibility, but let's look at this in more detail. This view of the world actually works very well, in both small and large projects, but requires some good software engineering discipline.
If developers are arbitrarily making changes and checking them in without testing, there's going to be a lot of failures and the resulting updates are certainly not always going to flow through the system in the same order. This model is simply not going to be valid. However, if code is unit tested and peer reviewed before being checked in to the repository, odds are that your processes are geared to the expectation that checked in code will be promoted.
Typically, checked in code is rolled into a build and only if there are build problems are some of the updates rolled back, corrected, or supplemented with a fix (i.e., another update). How often do you roll back updates as a percentage of all updates selected for your builds? Likely this is a very low percentage. So instead of managing parallel branches for your promotion levels, you could be dealing with this very low percentage (which you have to deal with anyway) as a higher priority item. If this is the case, then your promotion levels, instead of being separate branches requiring merges and additional administration, could simply be configuration lines drawn at various promotion levels in your stream branch.
Ideally, with adequate tools, these lines are drawn automatically, based on the context view. This is where promotion level (mentioned earlier) comes into play. You specify the promotion level you wish as part of your context (along with product and stream), and your view now reflects all revisions that are (i.e., whose updates are) at a status of that promotion level or higher, instead of always getting the latest. Because your tools are aware of this promotion level capability, they can be adjusted to automatically provide such views, based on the Update status, without the need for writing configuration specs. This is not easy to do with branch promotion management because branches are used for a wide variety of purposes, many of which are blurred.
In a main trunk view, the promotion levels apply to revisions across the main trunk, and there are some complexities here. But in a trunk per release (i.e., stream) organization, promotion levels apply to the release trunk (i.e., stream branch) for each file.
One of the key advantages of this approach is that if your tools support the promotion level views is its ability to roll back an update state. The view, that is, the promotion level configuration line, should automatically adjust itself, as long as this is an automated capability in your tool. Now it's not quite that easy. When you roll back an update, you must also inspect dependents that are affected. They too will have to be rolled back. Hopefully, though, your tool automatically makes you aware of such dependency violations whenever you roll back (or promote!) the status of an update.
One other thing to keep in mind is that while it may always (or usually) be fine to select all checked in code for a build, as you approach the end of release development, you do want to carefully control what code can be checked in, or more precisely, what work is authorized for creating an update in the release stream. Presumably, if you can't create an update in a given stream, you can't check out files to make the changes.
4. Files can be checked in to the repository without breaking the build: One of the by-products of branchless promotion levels, is that you can easily insert new promotion levels. There's no need for a new set of promotion branches, just an understanding of the promotion level, and perhaps a bit of data massaging.
One such promotion level distinction, not found in many projects, is the distinction between a checked-in file and one that is ready for the next build. For example, two updates may be co-dependent and completed by different developers. The first may finish his/her work early while the second requires a few more days. In some shops, the first developer simply keeps the files for the update in the workspace until the second update is ready. This causes the need for more parallel checkouts because the files are sitting in the workspace for an extended period of time. In other shops the files are checked in to a new branch, with the Update identifier as a tag, while awaiting the work of the other developer. In this scenario, at least the files are visible to other users, although branching, and subsequent merging, is required, even though it may be trivial.
Introducing a "ready" promotion level in-between the "checked in" and "in the build" states (a promotion level is really just a state of the object), allows the code to be checked in without having to create a separate branch and without the danger that it will get pulled into the build until the "ready" status is achieved. In the meantime, it may be checked out for additional changes to be made by the same or a different developer, without the instance of forcing a parallel update which would later have to be merged, as the new update can be performed on top of the first developer's changes.
5. To support parallel checkouts of the same file so that work isn't held up: We have just seen one example of how branching for a parallel update can often be avoided. More commonly, parallel changes overlap more than just at the waiting stage. Some shops disallow parallel updates, as a rule, to eliminate the extra complexities in process and testing. They enforce exclusive checkouts. While this is an admirable goal to simplify process, it can cause significant delays. An approach such as the ready promotion level can help to reduce delays.
An additional approach, whereby an update is broken into multiple less complex updates, all tied to the same change request, can further reduce the need for parallel updates or exclusive delays. For example, it is common that a few "header" files which define message codes, run-time ID's or other frequently expanding definitions, have significant checkout contention. When the check-in of the addition of one message code has to await the completion of the related feature, there's a significant chance of change overlap. By breaking the change implementation into multiple updates, with the first update just adding the message code, and perhaps a stub function to handle it, the chance of overlap can be significantly reduced.
Even after all of these process improvements, there may still be the need for somewhat frequent parallel updates (or delays if we take the exclusive checkout approach). So this is when you need to branch to support parallel check outs, right? Not quite. Your update has already announced your checkout to the CM repository. For the vast majority of such changes, there's really no need to create a parallel branch as well. After all, your workspace holds the file until you're ready to check in your update. If there are multiple users with updates in parallel, this does not signal the need for parallel branches.
In fact, some tools and processes which otherwise require parallel branches provide a means of subsequently removing or hiding such branches because they really add nothing to the CM history, and only serve as placeholders because, either the tools don't support some form of change packaging, or because even though they do, they don't allow parallel checkouts without branching.
I'm all in favor of minimizing the need for parallel checkouts. I'm even in favor of those companies which, having minimized the need, find it acceptable to use only exclusive checkouts. Often that strategy will even improve software design, forcing the design to minimize contention. But if you are going to do parallel checkouts, for reasons other than parallel release changes, I would be very reluctant to recommend branching to accomplish it, even to the point of suggesting an exclusive checkout solution.
6. Files of a particular update can have their branches commonly labeled to collect them together: For most CM tools, either they don't really support change packaging or such a concept was added on late in the game. Some of these tools and processes require, if you want to collect files together into an atomic update, that you simulate the change packaging by creating a branch for each file of the Update and labeling it with the change request or, even better, the update identifier. The branching is done specifically to support the labeling. The proper implementation of change packaging using first order update objects, makes this use of branching redundant. Updates are not tags on a file. Nor are they references to a task or tag.
In fact, the update should be the central object in your CM repository, serving as the key component in the traceability hub. The update relates your marching orders to the files that have changed. Their states are used to identify configuration views, and as we'll see, to create configuration line ups.
7. Collecting and Labeling File Revisions for releases, baselines, builds or other such purposes: Many projects which lack the tools for tracking releases, baseline and builds as separate first order objects, use branching for such definition tracking. Realizing that, for example, a release, no sooner than it is out the door, will require some fixes, they use branching both to track the definition and to provide a means for support. They then extend this concept to significant builds, and possibly to baseline definitions in general.
If, instead, your tools support the definition and tracking of releases, baselines and builds as first order objects, the promotion level management capabilities can be used to create release and other baseline-related configuration lineups without the need for separate branches.
Think about for a minute. A release does not create any new revisions of any files. It simply identifies the revisions of files that have already been created and tested along the promotion path. Similarly any build or baseline definition is simply selecting pre-existing revisions. Some tools and processes are smart enough to recognize this and do not force the creation of separate branches, perhaps using labeling instead. Some technology is not good at managing labels, because it's too difficult or because of labeling limits or because of performance issues. Creating separate branches in such cases can simplify while creating complexity.
There should be first order objects for managing such items. A build for example, has to pass through several levels of promotion: integration testing, verification, field trials, and maybe all the way to production. Different builds will end up at different promotion levels. This information needs to be tracked with the build, including when the build reached each state and what the ultimate plans are for the build: a nightly compile for development environment or a production build for delivery.
Trying to simulate a first order object with branches and labels is adding complexity to your environment where none should be needed. Your users want to look at and use the builds, not the set of files with a given tag at a given point in time. Forcing the branching of all of the files in the system just to track these things is really overkill, no matter how proficient the branching mechanisms are.
8. To identify, easily, the work done by a particular developer: Maybe you're stuck at this one. I find this one hard to believe. But I have seen shops where they force branching down to a level where individual developers can be identified. This way they can trace who did the code changes from the CM meta-data, without having to go into the delta-compressed file meta-data. I do find it useful to be able to identify work by user - but not by branching. The Update identifier or related object data should clearly identify the developer. There should be no need to branch in order to track this information.
This is an example of how some shops expand the use of branching to implement features. This is dangerous. Leave branching alone. It has a purpose. If you need additional data, add a database to your tool or switch to one that can provide the data mechanisms out of the box.
9. Back-ups of local work before it's ready to be "checked in" to the main branch: Now here's a valid requirement. My work is on my workspace, on my laptop. I'm making changes daily and it's not getting backed up, or if it is, it's distributed over 300 other desktop and laptop backups. It'll take hours, if not days, to identify where the backups are and to recover it if necessary. So the solution is to put it into the CM repository. Then it gets backed up in a common place whenever I want.
But if I want to back up my work nightly, I'll have to put it into the repository nightly. That means, in some tools, I'll have to check it in nightly. That means lots of revisions being created. Of course I'd want them in a side branch and not in the main branch of each file. So this must be a valid use of branching, but is not so, even if your tools and processes don't allow you to do nightly backups otherwise. Many CM tools allow you to shelve your work so that you save the current contents without creating new revisions of the file, or perhaps creating new revisions that are not visible. The latter case may cause some performance issues though. But either way, your CM tools and related processes should allow you to save your Update-in-progress without having to clutter up the CM information, and especially without having to create branches, even if they're temporary.
10. Variant Handling: So many projects start out as a one of, and then split off into dozens, if not hundreds, of variants of a product. Managing large numbers of baselines, builds, etc. is not only tedious and resource intensive, but it's risky. Sooner or later you're going to reach a point where you can't cope. I've seen it more than often enough. The solution taken is typically branching and labeling and still the problem persists, but now as organized chaos.
Variant management is primarily a software engineering problem, not a configuration management problem. Your team must specify design criteria which will make CM manageable. CM should not attempt to solve software engineering problems which are rooted in the design - though it may help you to identify them and variant handling is one area where software engineering needs to be addressed by the design.
It is not difficult to avoid dozens of variants. In fact, it's much, much easier to reduce variant programs into run-time options and configurations than not to do so. Introduce a simple design rule that a variant will not be a build time variant, or a load time variant, if it can be a run-time variant. Whether it's a sizing option, a language option or a set of feature configurations, you can create separate images that support each of these. You can just as easily add a requirement to the design team that says, the size will be configured at run-time by setting a size parameter, language through a run-time language setting, etc. The main valid reason for variants to be handled in the CM process is different platform support. This requires separate executables for each platform, but should never require branching.
In extreme cases you may be space constrained and can't load all of the 100s of options into one image. In this case, options should be handled as separate files which can be co-compiled (i.e., have unique names and/or name spaces), and the problem is brought forward to link time, or perhaps even load time. Files to be linked or loaded are tagged with the feature options so that a build specification need only identify the feature options. Still, just the process of moving these options forward to link time will result in multiple deliverables that have to be managed, and multiple test beds that have to be reloaded. So it should be avoided whenever possible. In any case, the problem is not one which requires the use of branching to solve. The use of branching for variants comes from the reuse of the same file name for each of the different variants, rather than the use of separate file names to support each variant. Common code should reside in the common file, while variant dependent code should reside in a variant specific file (i.e., with a file name specific to the variant).
How hard is it to get the design to follow this rule? Not difficult at all. The most difficult step, by far, is communicating the rule, and the need for it, to all developers, and then ensuring peer reviews identify violations. Apart from that, it is a trivial design change, and one that yields way more benefits than just avoiding variant branches.
A Next Generation Branching Standard?
I apologize for the length of this article. If I had identified additional reasons for branching, you'd see an even longer article. But it was necessary to justify each of the cases, and if you send me some others, I'll deal with them as well.
OK, you might not understand everything here. Or maybe you do but you don't agree with it all (please send your comments). But do you at least agree with the overall philosophy that branching is overloaded? If you do, then I really, really, recommend you go back to the motivation paragraph. We're not talking about a 5% or 10% reduction in complexity here. No it's more like the difference between Assembly language code and Python scripts. When branching is simplified, automation increases.
Am I recommending stream-based branching as a CM best practice? Not entirely, at least not at this point in time. It is one way to bring sanity to branching but it won't work for all projects for various reasons. I know of several companies which have used or tended to this future standard for their branching, often with their own home grown tools or scripting. But I also know of projects in some of these same companies that have stuck with various other branching strategies and patterns.
Why a standard as opposed to a branching strategy or pattern? Well, a strategy is a good, perhaps rigid, guideline, and possibly even more beneficial to an organization than a standard. However, in order for the industry to embrace it, a standard is required. When a standard is embraced industry-wide, in whole or in part, the industry tools can anticipate behavior. So, for example, if you have to move from one tool to another, your existing branching strategy may not map onto the new tool. This in turn can turn what might be a few days work into a few months of work. If the branching strategy is supported across tools, you will not have to limit your choices. The industry CM tools can also help to automate CM more effectively when specific standards are in place. It is not a downside that not all CM projects will embrace the standard. Not everyone uses Bluetooth. Not everyone uses ISDN. But for those who need those communication protocols, having them as standards makes all of the difference.
Don't just make a switch. Look at the various branching reasons listed here and add your own. I would really appreciate it if you added your own as comments or references to this article, as it would help everyone. Take incremental steps, reducing branching complexity until you're more comfortable with a full cutover. You won't realize most of the benefits until you reach a state of stream-based branching, and even then, it depends on the tools you're using to support such a capability.
But from decades of experience with this form of branching, I'm convinced that the next two generations of CM tools will see a vigorous level of support in this area. Many corporate processes, and some CM tool vendor guidelines, already embrace it in one form or another. The benefits are just too great for it not to emerge as a branching standard in the next generation of CM technology.