What a Fragmented Industry Gets Wrong with SCM Standards

article

April 19, 2010

Summary

In his CM: The Next Generation Series, Joe Farah writes that one of the biggest problems with software configuration management (SCM) standards is that the industry is currently too fragmented. Sure, there are many ways to do things and plenty of high-level standards out there, but as a whole, the industry uses different terminology for the most basic concepts and fails to understand that standards must go beyond ability and push the industry forward.

One of the biggest problems with software configuration management (SCM) standards is that the industry is currently too fragmented. Sure there are lots of ways to do things and plenty of high-level standards out there, but as a whole the industry uses different terminology for the most basic concepts, and fails to understand that standards must go beyond ability and push the industry forward. I do like the maturity model as a framework for helping to move forward.

The high-level standards are primarily process standards: make sure you identify every component, manage and control change, have a means of tracking the state of your CM artifacts and ensure that you can prove that you build is what was asked for. These are important but very high-level criteria. The problem has been that, as we look deeper, there is greater divergence there are various basic capabilities without guidance.

For example, it's fine for an SCM standard to say that all revisions of a file, once registered in a CMDB, must be available for all time. It’s fine to say that parallel branches of revisions must be supported. These capabilities, though, without guidance, can lead to poorer SCM. Just look at those projects where branching and/or labeling have gotten out of control.

Now one of the more frustrating things I find is the lack of a basic standard for terminology. What do you mean by status accounting, or even worse, is it a change package, an update, a change set or a change? Maybe it's time to start nailing things down a bit. Of course if we try to do this, we're in for a real battle. That's just the nature of standards.

Perhaps there are some of you who would like to add your opinions so that we can somehow march toward some better standards over time. I'm not going to try to bite off the whole standards issue in one article. It's just too big.

Terminology
So perhaps, to simplify the process of the SCM process standards discussion, I'll focus on terminology. I hate to leave behind the "push the industry forward" part, but for now I will, while addressing that in a future discussion on standards.

Terminology is such a crucial part of a standard and often the most visible component. Lack of consistent terminology can cause confusion even with the best standards. Even if the standards clearly define what the terms mean, if the definitions are unnatural or divisive, the standard, clear as it might be, might lead to overall confusion and rejection.

Now terminology itself is a large domain. There are things (changes, revisions, etc.) and there are actions (identify, audit, build, release, etc.). Some even have the same name (build versus build, release versus release). As actions tend to act on things, I'll focus first on things.

Below I present a bunch of terms used in SCM. I'm sure I don't have all of the terms used in all processes and tools around the world, but hopefully I've captured a number of them. In some cases, term name standardization is required, while in others, term definition standardization is required. I selected my preferences, and, more importantly, I've given my reasons for these selections. You may have different preferences and better reasons. Great! But let's hear from you.

Name Standardization
There are a lot of terms used to describe basic concepts. We need to drastically reduce this set of terms (not concepts):

Change set, update, change, change package
Revision, version
Variant, version, generic, option
Problem, defect, bug, issue, problem report, non-conformance
Feature, activity, task, to do item
Request, change request, issue, feature request
Item, artifact, object, file, document, datum (data), record
Iteration, cycle, internal release, life cycle
Stream, development stream, release stream, release branch, main line, product branch
Configuration, alignment, baseline
Build, build record, build notice, BOM
Context, views, context views.
Service pack, increment, update package, change supplement

Definition Standardization

Product requirement versus customer requirement versus design requirement
Product versus project versus IDE project
Process flow versus work flow versus state flow
Configuration versus baseline

There are a lot more, but I'm going to start with this subset.

What I'll do in each of the next sections is to identify a set of terms that are used and give my reasons why i prefer some over others. Hopefully, we can follow up with a thread on CM Crossroads, and move toward more standard terminology. I know it's hard for some tools to change terminology, but perhaps over time, if we have some agreed upon terms, we can move there. The process descriptions will then at least seem less tool-dependent.

Change Package
Change, update, change set, change package. A change, in this context, is a modification of a set of files/data elements for a specific purpose. It doesn't have to be source files. It can be requirements. It can be documents. Maybe it should be a coherent set of objects for any given change so that the "modified revisions" is a homogenous set of objects.

The problem with the word change is that it is a loose term used in various contexts. A change review board looks at change requests. A developer creates a change, which is a modified set of source code files for a specific purpose. Yet the word is a natural fit and I would really prefer to use it if I didn't think it would cause confusion.

The word update was used by Neuma and chosen because it was originally used by IBM. It was more specific then. These days it seems to have a new meaning, as a dynamic change to a program (usually applied over the Internet).

Change set, originally was coined by Aide de Campe (subsequently bought out) and meant a set of source code modification directives that could be included or excluded from a particular configuration. That's different than the changeset used by Microsoft and others. Change package is bulky and also has the connotation of package, implying it's really a collection of changes, or at least a recursive concept, at least in my mind. It fits well, when you're used to just managing files, but if you've been using changes in CM for over 30 years, the word package tends to be unnecessary verbage.

My vote: change set, update, change (in that order), but not change package.

Revision or Version
Revision or version? Let's say that a file is changed and then put back into the repository. It's still the same file, just a modified version of it. I used to use the term version, but for a parallel connotation, not a sequential one. I've dropped the use of that term altogether now because it is too ambiguous. Which version do you have: English or French? In my eyes, this is very different than 1.1 or 1.2. This causes confusion especially because most software deliverables come in multiple versions. So I prefer the more exact term revision, also used by hardware folks, and also more telling of what is going on.

My vote: Revision

Variant
Variant, version, generic, option. What do you use for parallel versions? I'm not talking about parallel releases, but separate product configurations stemming from a given baseline. Like the military version versus the commercial version. Or what about the Spanish version versus the English or French versions? To use the word version is still confusing because of its sometimes normal usage as a revision. I've seen the term “generic” used at times, but this is normally for larger variations (like military/commercial) and is simply not a well understood term.

Variant seems to me to hit the nail on the head. It's a variation of a standard offering. Indeed, there can be dozens of variations. Option is a good word, but it is really a selection of several options that determine a variant. A French military variant is created by selecting the military option and the French option. Make your life easier by deferring as many option selections as possible to run-time, thus reducing the number of variant deliverables.

My vote: Variant

Problem
Problem report, problem, defect, bug, issue, non-conformance. When a problem is found with the way software behaves, there are many different words we use to identify it. Really, what we have is either an error in the original specification, or a non-conformance with that specification. Non-conformance is fairly accurate, though a bit of a mouthful, and doesn't deal with specification errors. Issue is used strongly in the military, and outside of it too. However, issue does not necessarily denote a problem with the product. For example, consider a pricing issue or a support issue. It's more a statement that the customer is not satisfied. I like the term problem report but it's a mouthful, and can be confused with a report about Problems as opposed to a statement of problem. Bug is short and historical, but not very friendly once you move outside of the developer/tester domain. Problem, as in problem tracking, seems to have a lot of support, and defect as well, but to a lesser extent.

My vote: Problem, defect, bug (in that order)

Activity
Feature, activity, task, to do item. What do we call the things we do to a product that are different than fixing conformance to the original specification?   They're features, right? Well, usually. Sometimes they're refactoring activities. Sometimes they're code instrumenting. Sometimes they're creating basic architectural components upon which features will be built. Feature is a good term for what the end user will see, and it is an important and useful term. To do item is more general, but perhaps too much of a bookkeeping item. But perhaps task or activity fit better.

I have a bias here. I like these two terms because, unlike software problems which are too numerous and usually too simple to fix to spend time planning for each one, tasks and activities require some definite specification and expected effort. In other words, they fall well into the project management and product management domains. The former has to deal with scheduling/tracking the work, whether in an agile or traditional manner, while the latter has to deal with what is being completed for the next release(s) of the product. So I think it's important to tie these into the CM world to ensure we don't have different project management and CM databases, processes and efforts.

My vote: Activity, task

Request
Request, change request, issue, feature request. So what about a customer issue? How do we name it? What happens when the customer needs something done? We track it as an issue, or a request. Sometimes it's a change to the requirements, but it is the impetus for such change that we're considering here.

There are a few things to keep in mind here. One, the request is sometimes for a feature, sometimes for a problem, and the customer doesn't always know which. Two, the same request can come from multiple customers. Three, the customer may be an internal user or even a developer. Four, the request does not always require the vendor to do something, other than perhaps pointing to the correct page of the documentation.

Issue is a good word, but it implies problem. Feature request implies feature. Change request is good except that it doesn't always imply that change is required. So I like simply request, although I don't mind change request as long as there is no confusion with the change itself. A request lies in the customer domain. A problem or activity lies in the product development domain, eliminating duplicate requests and change-free requests.

My vote: Request, change request

Requirement
This one seems fairly straightforward. We don't have other terms for it, as long as we understand that a request may be an impetus for requirement change as opposed to a requirement itself. Alas, although the choice of term is fairly standard, the meaning tends not to be.

There are basically two camps. One says a requirement is a requirement on a product.   The other says a requirement is a requirement on someone doing work. In the later case, we start with product requirements, and then map them into the design phase and allocate new requirements for the design team. This may be a single step allocation, or there may be multiple steps (e.g., from high level design to software detailed design).

There's a big difference between a product requirement and a design requirement though. A product requirement is a marketing input. It is either part of a contract or part of a market standard. A product team doesn't control product requirements, though it does control everything derived from them. In fact it has to do project management on everything derived and as such, it would be best to turn the derived artifacts into project activities rather than having parallel activity and requirement artifacts for product management and project management. Note that there is a lot more data associated with planning and tracking activities than just the requirement description and a few attributes such as weight.

Still, it doesn't matter how hard we try, the requirement flow down (i.e., allocation) process is deeply rooted in many projects. As such, it might be unreasonable to expect the terminology to ever change. So instead, I recommend prefacing the term:

1) Customer requirement: A requirement expressed by the customer
2) Product requirement: A requirement for how the product will function, taking into account customer/market requirements.
3) Design requirement: any lower level requirement allocated from a product requirement

In this scenario, I recommend that requirement alone imply product requirement, that is, a requirement on how the product will function. Note the difference between customer and product requirement. A customer may request the ability to automate backups once a week. A product requirement might instead say that automated backups can be done according to a custom schedule, which might include weekly. Both of these are product requirements (i.e., how the product must/will/should function, but the latter is applicable to a greater market).

Also, be careful of those who distinguish between allocated requirement levels based on the amount of detail. Product requirements and design requirements both need full detail. It's just that one detail of how a product functions, from a black box perspective, while the other details the implementation architecture used to build the product. True enough: a product might have sub-products, each with its own set of product requirements. Again, this is different from design requirements. For example, a product might need a million pounds of thrust, and it might involve several engines, products in themselves, as sub-products. The design, though, identifies whether four 1/4 million pound thrust engines are to be used, or five 200,000 pound thrust engines.

Now despite all of this discussion, there are still numerous labels on requirements: system requirements, software requirements, high-level requirements, subsystem requirements, etc. What is important here is whether these terms represent product or design requirements, and what camp you're in, because that dictates what you mean by requirement.

My vote: Requirement, with a prefix when it is not a product requirement

Artifact
Item, artifact, object, file, document, datum (data), record. So what are the things we're managing in our CMDB? Files? If we're only managing files, we're a first generation CM process. We have to manage all sorts of data, in different sizes, shapes and forms. Some need revision control, while this is an over kill (read overly complex) for things like requests. Some need change control. Well, maybe all do to some extent and some are just stored (e.g., attachments to a problem).

It doesn't really matter what we call these things. We're not going to get consistency here at any time in the near future because different things have different properties - like outputs of a process being referred to as Artifacts. Good name, but so is Item, such as a list of Items making up the database subsystem. Object is not a good choice in a software environment because of the obvious other connotation. Document is all right as long as its used in the traditional sense only, otherwise it is simply too confusing to the average reader. Similarly data is fine if used in the small piece of data sense and not the large object sense which rarely comes to mind when the word is used. Perhaps record is a bit more inclusive in this case, but from a relational perspective, large objects are not really historically part of a record.

The thing to avoid is that of taking an existing term with historical meaning and trying to stretch it into a wider (or narrower) meaning. This causes confusion.

My Vote: Item, artifact, record (in that order), but any of the others only in their traditional contexts

Iteration
Iteration, cycle, internal release, life cycle: The development cycle. What do we call it? Life cycle used to fit perfectly, with a new life cycle for each release. Then came agile. Now life cycle clearly has the meaning of traditional life cycle. Cycle is a bit too generic. Iteration is good but does currently tend to imply agile methods. Still, there's no reason why, in a traditional shop, each release cycle couldn't be considered an iteration. Internal release is OK except that it tends to imply but not real releases. A release is really an internal release that is followed by full verification, audit, product documentation and other packaging, though. So we'll stick to dealing with the development cycle in this definition.

My Vote: Iteration,internal release (in that order)

Development Stream
Stream, development stream, release stream, release branch, main Line, product branch. So what do we call the sequence of development work done leading up to a release? This may involve one or multiple Iterations. It's really a stream of work activities including design, implementation, testing, documentation, etc. So I like the term development stream, but that's a mouthful. I also like release stream, also a bit long, so I like the shorter term stream. Some like to use the terms release branch, generically encapsulating all of the work done along a development branch, and others main line, particularly if the release branch is the main trunk for the source code.

The problem with both of these terms is that there is far more than source code in a development stream, and release branch and main line simply don't seem to suffice. For example, there are requirements, activities, problems, documentation, source code, test cases, test results, and so forth. Some might argue that there should be branches for each of these things, but I think that's trying to shoe-horn a natural process into a square container. Yes, requirements need branches (e.g., release 1 requirement tree versus release 2 requirement tree). So do some forms of documentation and source code. Activities should be specific to a development stream without branching. Problems might apply to multiple streams, but certainly its the solution or fix that is tracked against each.

Depending on the granularity of your test cases, it might be natural (coarse granularity) or overkill (fine granularity) to have branching, although certainly some fine-grained test cases will apply to specific releases. The more natural approach here is to tag fine-grained test cases with the streams to which they apply. Similarly, activities and changes should be tagged with the stream to which each one applies.

There's a bit more in this case though. Unlike the other terms, this term involves evolution over time. So stream history, or branch history, is important. For example, a change made in release 2 will, in the vast majority of cases, be applicable to releases 3 and 4. A problem fixed in release 2 will not appear in releases 3 and 4. This assumes that release 3 is derived from release 2 and release 4 from 3 (or at least 2 in this example). This is an important term. I like stream and stream history. What it comes down to is that the stream history is the branch history of the product container, whatever that may be in your CMDB/process/ tools. That is, how does my product evolve?

The reason I like stream history (and stream) over branch history is that most, though not all, projects use a branching structure that does not reflect stream history. Hence, branch history is a whole other consideration that's more applicable on a file/item by file/item basis. There is a rather recent problem with the word stream though: it has a different connotation in some exiting tools (e.g., Accurev). So maybe product branch might be a good alternative. Note that this term must implicitly be placed in the context of a product (or product hierarchy, or whatever is released together).

My Vote: Stream, development stream, release stream

Product versus Project
Product, project. We all know what a product is and what a project is, don't we? So why would these two terms cause confusion in a development shop, some may ask. Others know. The primary reason is the use of the term project in IDEs, such as Eclipse and Microsoft's Visual Studio. The IDEs want to focus on the project that is being completed as part of the development effort. But in the end, the IDE project outlasts the project. In fact, product would have been a better word, or perhaps deliverable, deliverable set, or package. An IDE project creates some deliverable(s). When the release is all done, the IDE project persists to create the deliverables for the next release.

Traditionally, a project is a managed set of activities, and this set of activities culminates, frequently, in a release, be it a new internal/external release, a variant Release or a milestone. The project is identified by a work breakdown structure which outlines the activities to be performed. An IDE may be used for some of these activities (and a wider set in more recent IDE releases), but the IDE project is a misnomer.

A product is a collection of deliverables which together meet a product specification. A product may be a hierarchy of products, with sub-Products being reusable across multiple products. So here's my take:

My Vote: Product, project and IDE project

1) Product: A collection of deliverables which meet a product specification
2) Project: A managed set of Activities culminating in a release or other milestone
3) IDE Project: The term used by some IDEs for their product file collections

Process Flow
Process flow, work flow, state flow. I'm not really sure if there is confusion about these terms in the industry. It's hard to tell. But just to be sure, here's my take on them.

State flow (also known as state-transition flow). State applies to an item/artifact. As it moves through the process it assumes different states which indicate who may act on it and what actions are permitted. For example, a problem moves from its original state through a triage process that puts it in one of several states. Eventually the problem gets fixed or answered and reaches a closed state. Only the originator of the problem can agree that it is closed (i.e., the original problem has been addressed, and this would only be done once the problem reached some sort of fixed and tested or answered state).

Work flow. Work flow shows how some input flows through the system to cause some output. For example, a request may cause a feature activity to be created which may cause a change to requirements. The activity may lead to the creation of product changes, and test cases which are to be run on the changed product. The test results might then cause the request to reach an implemented state and then a release might result in the request being marked completed and ready for closing by the originator of the request. Work flow documents the interaction of items/artifacts according to the planned process. It highlights traceability actions that must be performed (hopefully automatically in most cases). It does not focus on the who, as the state flow deals with that.

Process flow. This is a broader term that encapsulates both state flow and work flow. It also involves other aspects such as consolidation of work flows into an overall release process. For example, milestones are met by performing regular builds which collect all (or a designated subset) of changes which have been readied by the development team for a specific product development stream.

Configuration
Configuration, alignment, baseline. I don't think there is a lot of confusion here, but let's clarify some things. A configuration is an aggregation of a set of item revisions. If the items are requirements, we get a requirements configuration. If the items are source code files, we get a source code configuration. You might want to aggregate unlike items, but let's not go there right now. A configuration typically consists of a consistent set of items (i.e., item revisions).

The process used to create this consistent configuration is sometimes referred to as configuration alignment, which is aligning the right revisions into a configuration.   For this reason, you might find the term alignment used as a synonym for configuration.

Some might also use the term baseline as a synonym, but this is not valid. A baseline is a frozen (i.e,. never to be changed) configuration. Configurations can change, and dynamic configurations, expressed as a set of rules for what should be in the configuration, change as the data in the CMDB changes. Now there may be tools that only permit you to create a configuration by creating a baseline, and in that case the configuration cannot change either. A baseline never changes. It is a frozen configuration, hopefully, but not necessarily, a consistent one. The purpose of a baseline, as its name implies, is to measure change. How much have we moved on from the baseline? What's the difference between the item as it is now and how it was in the baseline?

My Vote:Configuration and baseline

1) Configuration: An aggregation of a set of item revisions
2) Baseline: A frozen Configuration

Build
Build, build record, build notice, BOM. So what, then, is a build? Well, build has three different connotations: an action, the result of a build action, or a record of a build [1] action. Even the action sense is often used as a noun: I'm doing a build [1].

A build [1] takes a configuration and transforms it into a set of (potential) deliverables, (i.e., the build [2]). Most often a build [1] is done against a non-frozen configuration; that is, developers creating their own builds [2] to test their in-progress changes. It's up to the developer to know what incremental changes are being included, and this is done through a workspace mechanism. When a system build [1] is done from the CM Repository (i.e., CMDB), though, it is often important to record exactly what was being built so that if there are problems they can be reproduced, fixed and retested. Some shops do system builds more than once a day, though. Does this mean that a new baseline has to be created more than once a day in order to track what went into the build? Some shops actually do this. However, if the build is to reflect a baseline, does that mean that all 12 variants of that build require 12 separate Baselines. Again, some shops actually attempt to do this.

But a build [3] record is a record of what was built. It is not a baseline. I can build the same baseline with two different versions of a compiler to ensure that both versions give the same result. Same baseline, different builds. The important thing about a build [3] record is that it identifies:

The baseline used
Changes used with respect to that baseline
The variant options used
The tools/processes used to perform the build

You can see that several builds can be performed from the same baseline. In general, you "manage the baseline and build the subset". That is, your baseline is an aggregate of all of the possible components that can be used for a build. You select variant options to customize what you're building from the baseline. Add in some changes that may reflect a new variant option, but more frequently, fixes problems and/or adds or extends features. You identify the revision of your tools/processes that are being used for the build. So you might create a baseline twice a month, but do builds on that baseline, adding in all other ready changes, a couple times a day, or multiple times for different variants. Developer builds are (typically) not tracked so rigorously by the CMDB. A build [3] record, is often referred to simply as a build [3]. A future build [3] is sometimes referred to as a build notice (similar to change notice in the hardware world) or in the past as a build record.

Finally, BOM is used in the hardware world to describe the set of part revisions required to build something. There are a few differences though and so it might be wise to avoid this term.

My vote: Build, build notice, build record

Context View
Context, views, context views. There are a lot of products, a few development/support streams for those products, loads of changes, etc. in a CMDB. Navigating all of that data can be overwhelming. However, any decent CM tool will understand that you're interested in working in a particular context. And the context can be used to simplify the view of the data. Normally it's the context of a particular build record, or the latest checked in source code of a particular product for a particular development stream. Sometimes it is a combination of a baseline or build plus a set of changes. Whatever it is, and however it is specified, the goal is to identify the context of your discussion, query, data search, or whatever.

When applied within the context of a tool, the Context is referred to as a context, a view, or a context View. There are other terms used here and there, but these are the most dominant. For the most part, I think any of these three terms suffices, with context view preferred if there is any confusion. In particular, context is a fairly generic term that can sometimes cause confusion, thus the reason that I quoted it in the first sentence of this paragraph. View is a frequent database term as well, which is more specific than a context view, usually applying to form.

My vote: Context view, context, view (in that order)

Upgrade Package
Service pack, increment, update package, change supplement. Lots of different terms are used to identify an upgrade package for something that's already been delivered. However, as this is really at the tail end of the ALM process, it won't have a big influence on the overall process or standards. Still, if we had to pick a term, I like update package, because it's clear, or increment, clear but a bit more general in nature.

My Vote: Update package, increment (in that order)

Summing Up
So there you have it. I've hit on some of the big ones, but there are still many to go. Please add to this article through the reply mechanism. What terms do you use/not use for CM artifacts, and why? Do you think we can ever attain common ground? We've been further apart than we are now. Some terms, like branch and test case, were agreed upon decades ago. Some may never get there.

Just to summarize my picks:

Change set, update, change
Revision
Variant
Problem, defect, bug
Activity, task
Request, change request,
Customer requirement, (product) requirement, design requirement
Item, artifact file, document, record
Iteration, internal release, life cycle
Stream, development stream, product branch, release stream
Configuration
Baseline
Build, build record, build notice
Context view, context, view
Update package, increment, service pack, change supplement

What are yours?

Topics:

configuration management

About The Author

Joe Farah

President and CEO of Neuma Technology, Joe Farah is a regular contributor to the CM Journal. Prior to cofounding Neuma in 1990, he was a director of software at Mitel. In the 1970s, Joe developed the Program Library System (PLS), still heavily used by Nortel (Bell-Northern Research), where he worked at the time. He's been a software developer since the late 1960s.