Making Incremental Integration Work for You

[article]
Summary:

Recent CM Crossroads posts have suggested that a branch-per-change branching strategy is good because it gives you the ability to maintain a stable "main" trunk, while integrating a change at a time if you want. As Joseph Reedick put it in one of his responses:

"You're trading greater complexity, more merging, and a greater chance for mistakes for the ability to deliver small changes sooner. Even then, with all the merging going on, you'll probably have to re-test successfully delivered changes as other changes get merged on top of them (to make sure the merge didn't break previous code or roll back previous changes.)"

The obvious question is: How do I allow a change-by-change integration strategy to proceed without the negative side effects? Perhaps you're normally going to do a small batch of changes and infrequently your batch may be very small. Here's where both good tools and good process are critical. They must support change management. More than that though, they must support incremental development environments, incremental impact analysis and the ability to roll back changes easily.

If you're doing more frequent integration cycles, you'll likely want to be able to move on to your next cycle before the current one is completed. As a result, you may need multiple promotion levels whereas your previous process may have supported only one or two.

There is a key property, which, for some reason, most CM processes and vendors seem not to have caught on to yet. In a broad system of hundreds or thousands of files, most changes will move successfully through the process in the order they're put into the system. This is especially true if you have controls on your product development and support streams which allow you to easily support restrictions on check out operations - or at least on check-in operations.

Why is this a key factor and how have we missed the boat?  Most processes and tools are set up to provide stable environments, some promotion levels, etc. To do so, they use branching to support promotion models. This is very flexible because you can have as many branches as you need to support your promotion levels. There's a cost though:  in branch and merge operations, along with labeling. Some prefer the "pull" method, where the CM team is responsible for the merging and labeling. Some prefer the "push" method, which developers accept as a cost of having good CM. I prefer neither.

The problem with this branching pattern is that it focuses on success for the worst case scenario - and yes it can certainly handle it. But the cost is high. Instead a strategy that focuses on the most typical scenario and still allows the worst case scenario to be handled not only is less costly, but will almost surely result in better unit testing (i.e. developer testing).

Basic Assumptions
Begin with three basic assumptions about our CM tool and process:

·         A branch-per-release stream strategy is used at the product level

·         A file is branched only the first time a change is applied to it for a stream which does not apply to previous streams

·         The tool allows us to package files into a change, and target the change to a particular stream

The first assumption says files will follow a branching pattern that reflects that of the product the product. For the product, this does not mean that release 1.00 and 1.02 and 1.03 all have their own project branches. It does mean that release 1, release 2, release 3, and perhaps release 2b, all have their own project branches. A branch-per-release stream strategy will allow us to put things into parallel streams in parallel. No waiting until one stream is finished before starting work on the next (see Diagram). What's more important from a process perspective is that there's no change to the way I work in one stream when the next stream starts out. I set my view or context to the product and stream, and hopefully the CM tool guides me the rest of the way without any complex branching and labeling strategy.

The second assumption says we have a tool that will allow us to branch a file only if it has not yet been branched into the release stream I'm working on and a change needs to be made to it which does not apply to earlier streams. I don't want to branch a file just because I'm starting a new project stream. If I do, then during the first part of the new stream, most of my bug fixes would have to be made in two branches of the file unless I close down changes to the previous stream - and that's a bad reason to force customers to live with what they've got. So, for example, if file A has only a release 1 branch and a new release 3 feature affects this file, only then is a new (release 3) branch created. The CM tool must support this in two ways:

·         It must allow fixes to older streams to be propagated automatically [optionally] to later streams when the affected files have not yet been branched. More specifically, the propagation must follow the product road map so that it is only propagated to the correct set of streams. Ideally the CM tool will warn you when you make a change that is not going to be automatically propagated to future streams, and will track this information until you either merge the change forward or indicate that there is no need for the change to move forward (e.g., the problem was peculiar to release 1 only).

·         The CM tool must support views which automatically pick up appropriate branches for the context setting, in such a way that the onus in not on developers and other users to specify a set of rules to use. It must be simple. I want to set my context to view this product, this stream of the product. It must further support this concept whenever branching occurs so that it can suggest the appropriate branch point rather than placing the onus on the developer to figure it out. In the much more rare case when the developer wants to select a different branch point, (s)he is aware of this and can override the default.

The third assumption says that we have a CM tool suite that allows us to package files into a change, and target the change to a particular release stream. The more advanced the change management, the easier the incremental integration is going to be. For example, as we shall see, we will want change management to support a process model which includes promotion and roll-back operations at the change level (as opposed to the file or problem/feature level).

So, we still have requirements for rapid incremental integration. This means multiple promotion levels, as well. We still need to make sure that the development environment is stable. Can we do these things and still avoid a heavy load of branching, merging and labeling? I've been doing CM like that way for over 25 years now, in projects of up to 30+ million lines of code, even though for the first 10 or so years I had to develop tools to support the effort.

Guidelines to Simplify Integration
Remember this key step: changes will normally flow through the system in the order they're developed. Organize your process and tools to make this the simplest case and then deal with the exceptions.

Rather than starting changes off in their own branch, changes will be applied directly to the stream branch. But we'll use a few guidelines and capabilities to help out. These are not hard and fast rules, but rather guidelines that are to be followed to avoid more drastic circumstances.

·         Absorb risky changes into a new stream as early on as possible, preferably before most of the developers have migrated to development in that new stream. This is basically a way of stabilizing a development stream early on in its life time, before the impact of instability is large.

·         Break up complex features in such a way that the non-upward compatible (NUC) portions of the feature are implemented up-front with hooks added so that the rest of the feature can be done as a series of incremental changes. For example, if changes are being made to message protocols, these are done early, and preferably in such a way as to respect backward compatibility. The functionality afforded by the changed protocols can be implemented in a series of subsequent changes.

·         Track promotion levels against changes, not against file revisions, and the tool will automatically identify promotion violations. A file revision will basically be open for change or closed (change completed). We avoid having to make sure that we promote all of the file revisions of a change in unison - instead, we just promote the change. As well, if we promote a change which depends on another change, either implicitly (e.g., changed the same file) or explicitly (e.g., specified in the change package data as a dependent change), the CM tool will clearly point this out and allow us to deal with it.

·         Choose a CM tool that will support automatic baselining and context viewing based on promotion levels of the change. This means that we can set up baselining to work only on sufficiently promoted changes - no need to hold up check-in operations. It also means we can view different promotion levels (e.g., latest checked-in, last night's compile, last system integrated changes, latest verification tested changes, etc.). By view, I mean that we can look at the code, do incremental bug fixes, create system build environments, etc. It also means that we can roll back a change from certain views by simply demoting its promotion level.

·         Support check-out (and/or check-in) restrictions. For example, when a stream is close to its final beta release, we may want to restrict changes except for those which fix problems on the release-gating problems list. In some shops we may even want to restrict changes to header files beyond a certain date, again in a product stream specific fashion.

Incremental Integration
With this in place, we'll proceed to do incremental integration as follows:

·         Have developers check-out and change files as necessary, directly in the release stream branch, and only as permitted by any restrictions. Encourage exclusive serial check-out and check-in, but allow parallel check-outs in the same branch (assuming our CM tool will support this), with a reconcile operation on check-in. Where there are explicit change dependencies, have the developer attach these to the change package.

·         Have developers unit test their changes before checking them in. Here, we refer to the unit of change as opposed to the module unit. Also, consider having the developer push checked-in changes to the ready state when the change (and associated dependent changes) are all ready for build integration.

·         Do nightly (or even more frequent) change selections from the checked-in, ready changes, promoting the changes to a selected state. At this point we might optionally create a new baseline, or if your CM tool permits, align one and freeze it only after a successful sanity test on the build. We would also create a build record identifying what is going to go into the build.

·         For smaller environments, rebuild the entire environment for shared use by the development team. For larger environments, we might do incremental builds, whereby we retrieve only the changed files and files affected by the changed files and perform either a make operation or an incremental build operation (i.e., re-compile these specific files). This will depend on your CM tool capabilities. Although you may wish to do builds more than once a day, you may wish to insulate the developer environment a bit more, perhaps making a new build available nightly.

·         Repeat the above steps for other promotion levels as necessary. For example, we may be supporting build environments for the latest ready code, the latest integration/sanity tested code and for both verification test environment and the field trial environment. The incremental build environment option works really well when you have multiple promotion levels that you're working with when the builds take a long time or when the promotion level environments are incremental to one another. Especially in this latter case, the cost of distributing four or more promotion level build environments is really not much different than the cost of distributing a single level - most of the time at least.

There are a lot of places where the CM tool is going to be able to help out, from the automatic generation of Make/Build files, to promotion violation notices, to support for incremental builds. Some tools (e.g., CM+) will even eliminate the need to build intermediate library archives/dlls in order to do object code sharing.

In the end, we've assumed that changes are flowing through the system, pretty much in the order that they've been made - with a bit of allowance for developer push to the ready state to co-advance a set of changes. This has eliminated most of the branching and merging work, especially as the developers realize that, with no reason to hold up a check-in operation, the frequency of parallel check-outs drops dramatically.

Still we have to allow for disruptive changes. There already is a much better chance that unit testing is going to be done against a more up-to-date and sequentially growing build environment, since there are few, if any, merge operations that need to be completed. What happens when a change breaks the build, though?

Usually, because we are doing frequent integrations, we can simply roll back the change with no impact on other changes. In some cases, we may need to roll back dependent changes as well. But roll-back operations are simple roll-back of the state of the change(s) (i.e., their promotion levels), and a subsequent rebuilding operation.

If we can't afford to roll back some changes, we can fix them through a subsequent change, promote that change and do a rebuild. If we want to allow subsequent changes to the rolled back functionality to continue because the roll back will be for an extended time, we change the branch tip to reflect the latest acceptable functionality by creating a new change which checks-in older acceptable revision(s) and promoting it. Then new changes may continue against this tip. The rolled back functionality will have to be reconciled forward at a later time.

The big difference here is that we are dealing with an exceptional case. We are incurring additional overhead in this case (really comparable to the branch, merge, label strategy overhead), but we expect to infrequently have to deal with this kind of exception.

More importantly, the file history is much easier to comprehend. The complex branching strategies that have been worked out, reviewed and approved ahead of time, and then mapped onto a number of default context view specifications disappear. The CM tool can easily tell you when you need to branch (assuming it is tracking the product road map, past, present and planned). And because the complexity is reduced, incremental integration is that much easier.

Although many of you old-timers who have religiously used branches for promotion level management may be skeptical, it's time to take a closer look. How much time do you spend doing branching, merging, labeling, creating branch and label strategies, creating view configurations? More to the point, how many incremental integration opportunities are either overlooked or scaled back because of the expected overhead?

Does your CM Tool Support Incremental Integration
Before finishing, I want to point out additional capabilities that are lacking in most tools, which would be useful in supporting incremental integration.

·         Incremental impact analysis: There are tools that will scan in your source files and allow you to do some basic impact analysis. If I change file A, which files are impacted? What we really need to ask is: If we promote these changes for an integration build, which files are impacted? Most of the time the impact of our incremental change set will be small, but we want to be able to identify when it will be large and control change selection; this is not done to eliminate a change, but rather to better time its promotion. This is equally important to a developer in an incremental build environment as it is to a build manager.

·         Incremental build tracking:  If you're doing frequent incremental builds (or even full re-builds based on an incremental set of changes), creating baselines for each build (and perhaps variants) will be expensive. Baselines, by their definition, provide a comparison point, a point of reference, a baseline against which builds can be compared. They serve as important reference points and hide the concept of changes since they are defined as a compatible set of revisions. CM tools need to support build tracking in a baseline plus changes manner. First of all, the build definition refers to a baseline. Then it specifies a set of changes applied to the baseline. This is good for build variants, customizations, etc., but it is really ideal for incremental builds on a baseline reference. Not only is there a common reference point for easily comparing recent builds, but the definition clearly enumerates the list of changes applied to the baseline. This is friendly to incremental integration as it is easy to see what changes went into each build. The better tools will allow you to track which builds are incremental versus those which are variants or customizations, etc.

·         Removing timestamp dependencies:  Most CM tools will rely on makefile functionality (or similar) and their associated time stamp comparisons to determine what needs to be re-built. This can normally be done in a safe way, but sooner or later you get caught by it - usually in a developer role, but occasionally in a system builder role. A CM tool should be able to easily generate a build script, or at least a data file, which indicates what has to be compiled, not based on timestamps, but based on the set of selected changes and the impact they have on other files - ideally using the build definition record.

·         Incremental changes to a build environment: Once you have an incremental build definition, you want to be able to update your environment in order to do your build operation. CM tools need more flexible file retrieval capabilities so that it’s easy to populate, not based on a set of problem numbers or a baseline definition, but on the changes and the impact of the changes. Older proprietary tools (e.g. PLS and SMS) support a file expression capability for identifying and retrieving the correct set of files in one simple operation. Very few modern CM tools (e.g. CM+) support this same capability.

Reporting and Incremental Integration
Finally, it all comes down to knowing what you're doing and what you've done. Even more important than release note generation, I would claim, is the ability to be able to report to the system integration, development and test teams, what has changed in each iteration of incremental integration. In the later stages of a release, this reporting is equally important to the change control board and to the build managers.

When an incremental build is ready for sanity testing, the first focus is on how well the build initializes. Then, the focus moves to changes in the functionality. You want your teams to have a list of problems resolved and features advanced so that they can first test these out and perhaps establish (or selected) an incremental set of test cases to be run. You don't have time to run a full regression test, typically, at least not before you make a go/no-go decision for moving the development environment onto the new build. But it sure is nice if you can quickly focus on fixes and feature areas addressed by the incremental build, because these are the most likely areas where problems will occur.

Make sure your tools and processes allow for automated and effective reporting. You can't spend a few hours at it because in a few hours you may well be processing your next integration cycle.

Where to From Here?
Though I've not actually made a case for incremental integration here, it's a way of life at Neuma. If you're going to embark on it, make your life easy:

·         Use tools that support change packages

·         Consider using a per-release-stream branching strategy

·         Find a process/tool that helps you to avoid a branch-per-promotion-level strategy

·         To define incremental builds, use a baseline plus a set of changes

·         Use a tool that gives you adequate reporting

·         Manage your risk up front, even before stream development starts

Then you'll make incremental integration work for you, rather than vice versa. Instead, you can tackle the bigger problems, like automatically calling the developer who introduced compile errors, improving your sanity [testing] and getting time off for the holidays.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.