Do Your CM/ALM Tools Help Secure Your Development Assets?

[article]
Summary:
You're part of a very successful growing software company. As you approach the office one morning, fire trucks out front indicate that this is not business as usual. Fortunately, you have nightly off-site back-ups. Unfortunately, you'll need equipment, software, back-up recovery operations, and perhaps things can be back to normal in a few days with limited data loss. Or maybe you've noticed data problems creeping into your development repository ever since the recent round of layoffs. Or a hacker. Maybe there was a critical disk crash. Or maybe a new software release has introduced data inconsistency. There are many ways your development assets can be compromised. So you really need many avenues to secure them. Your CM and/or ALM suites are part of your development backbone - they must be up to the task of getting you back on your feet, the same day.

Pick-up-And-Drop Technology Transfer

The first question I might ask is do you have pick-up-and-drop technology transfer. Can you take your product and drop it into a new site? Whether it's disaster recovery or a company merger, you should have this capability. Where are your development assets? Scattered across developer desktops? Scattered across servers? Scattered across multiple sites?

Your development assets include everything from product inception to customer tracking. These are all critical to development. They do not start at the software design group and end in the verification group. They also include the tools and platforms required end-to-end - including your customizations. If you can easily pick-up-and-drop your product into a new project setting, you're sitting in pretty good shape.

Backup Basics
Another measure of your situation is your backups. How complex is your backup process? Does it include information that rests on a developer's desktop (or notebook), and if not, what is your exposure here? Does design documentation sit on a developer's disk? What about work in progress - how easy is it to shelf it. Does your CM/ALM tool allow you to do consistent backups, or do you need a lot of post-recovery maintenance to get things back on track? Is your ALM suite spread across multiple servers? Where does the glue reside between components?
If you run a multiple site development shop, does consistency extend across multiple sites? Can you easily relocate one site, or perform data recovery at one site, without disrupting the others?

Beyond Backups
Backups generally provide a way forward, a means of recovery. But still, precious time is lost in the recovery process. There are hardware solutions, such as RAID (disk arrays), and hybrid solutions such as disk mirroring. Some CM/ALM tools allow specific mirroring, of both source files and meta-data files. The advantage here is that a smaller portion of the disk (typically specific directories) requires mirroring, improving bandwidth and reducing disk space requirements. Another advantage is that the tools will typically allow you to recover quickly or even automatically from a disk outage.

Another means of recovery supported by some transaction-based tools is to replay the transactions processed since the last checkpoint/backup. If transactions are located on a different disk from the rest of the repository, this can add some redundancy to your backup strategy.

CM/ALM tools should not hiccup if there was a power outage in the middle of a critical operation. They should always recover gracefully. Any manual intervention required to recover will lead to productivity loss.

Data Sabotage:  Checkpointing and Roll Forward
For any number of reasons, you may have someone manipulating your data illegally. Maybe they delete a pile of data. Or maybe they more subtly morph the data so that the problems are not noticed for weeks or even months. To deal with this, your CM/ALM tools need to be able to perform 3 basic tasks: checkpoint your repository (or in the worst case full backups); locate potential areas of data sabotage (or for that matter, a typo/inadvertent action that went unnoticed) and repair the errant transactions; roll the repaired transactions forward from the checkpoint.
Although it may seem that such a capability requires a heavyweight solution, these capabilities are not that difficult for CM/ALM to design into their tools from the start. CM+, for example, allows very lightweight check pointing (a few seconds) and even automates this. Given a checkpoint, you can actually view the repository as it was at the time of the checkpoint, in parallel with your running system. After editing transactions and placing the checkpoint in place as the current checkpoint file, the system will automatically replay transactions, preserving original timestamps.


If this level of data robustness is a requirement, it's important to look under the hood at the architecture of your prospecitve CM/ALM solution. With a reasonable bit of discussion/reading, you should be able to familiarize yourself with both the capabilities and limitations, the latter often being carefully hidden within the marketing hype. A good solution will even provide you with an added layer of protection against software errors in the CM/ALM solution.

Hot/Warm-Standby Disaster Recovery
When a disaster arises, whether it's a disk crash or a natural or man-made disaster, ideally you'd like to continue operation with barely a glitch for your users. If you have a solution which can do this, you have hot-standby disaster recovery in place. If your users have to restart their application, but otherwise can continue from where they left off, you have a warm-standby capability. If your users are off-line for more than a few minutes, you will have some productivity loss. This may be more severe in mission-critical scenarios which cannot tolerate down-time.

One means of improving your recovery speed is replication. If your repository is replicated at another site, you can generally be up and running more quickly. Of course the tools must also be replicated. In the worst case, you may have some replication latency and lose a bit of data. If you have continuous replication/synchronization of your data between sites, you're in good position to establish a warm-standby, and possibly even a hot-standby capability.


Replication can work from a repository perspective - identifying data differences and synchronizing them - or from a transaction perspective - processing the same transactions in the same order using the same tools at multiple sites. The latter generally requires less bandwidth and provides more site autonomy, but this will vary with each solution. If your clients can be easily (or better yet, automatically) switched to work with any site, you have at least a warm-standby capability


I remember the first time our main repository disk crashed and we were able to continue on without a hiccup using our multi-site as a warm standby recovery capability. Now a RAID (disk array) would have worked here too, but there are less neat disasters" that may require a multi-site based solution.


As a buyer of CM/ALM tools, it's not sufficient that a vendor advertise such capabilities. If this capability is important, it must be easy and reliable. Your vendor should be able to readily demonstrate the capability, or better yet, have you demonstrate it yourself. A complex warm-standby capability may cause you to delay its implementation, or may defeat you when the need to use it arises.

Data Segregation
Of course, securing your data goes beyond maintaining data integrity. It also includes the ability to control data access. Beyond basic password control, your CM/ALM tool should provide a means of segregation data into (possibly overlapping) partitions that are appropriate for your level of access. Access control can get very complex very quickly. It's important to have a variety of means to control access so that complexity can be kept to a minimum.
Here are some examples:

·         Segregation of data by keeping it in separate repositories, each guarded by user name/password/etc. controls.

·         Segregation by role, so that certain roles are required to see and manipulate some of the data. For example, a customer role may have limited query capabilities, while a developer role may have a much   broader access.

·         Data filters can be used to arbitrarily filter what gets through to an end user. This can be done on a data classification or other basis.

In our shop we use a number of these. We use passwords for repository access. We use data ownership along with data permissions and access control lists. We use product-based data segregation so that all data relating to a given product is accessible only to those users who have been given access to the product. The beauty of this latter method is that a simple product specification identifies which problem reports, documents, source code, activities/tasks, etc. are visible for a given user (or class of users). Such a solution is a simple means of performing ITAR data segregation. We also have dynamic user interface customization that adapts to roles and permissions granted to our users.

Data Encryption
Of course data encryption is another means of securing data. Unless you have the key to unlock the data, it does not matter that you can access the raw files. Encryption applies to documents, source code and other data. But it is also useful during network transmissions, especially given how viruses have evolved over the years.

Encryption needs to be managed at multiple levels. For example, only the tools should have the keys to unlock data transmissions. Within the repository, there will be a need for project-specific, site-specific and user-specific keys, among others. And managing and securing keys becomes a challenge in itself.

A Few Other Factors
Some CM/ALM tools provide a means for inserting, automatically, copyright notices into source code. This helps to address the intellectual property concerns. However, it is recommended that you embed some hidden copyright notices into your program code, so that if push comes to shove, it's easy to demonstrate that you really own the copyright. Of course within a CM setting, maintaining the history and evolution of your product will also speak volumes against anyone with claims to your IP. Imagine how easily the Unix code claims could be handled if the code were all properly managed in a CM/ALM system with full change control and traceability.


Another key factor to consider is consistency of your solution across the lifecycle management applications. Ideally you want a single consistent means of implementing data security and integrity. An integrated ALM tool will likely give you fewer headaches than will point solutions which are glued together. As with anything relating to CM, automation is the key. Whether it's backups, encryption, recovery, or otherwise, the more that you can automate, the better off the solution will be: more reliable, less error prone, and easier to administer.

At What Cost?
The Department of Defense seems to have deep pockets when it comes to securing development assets. What about for the rest of us? What can we hope to afford? The last few years have pushed security features from the top dollar optional feature classification into the commodity capability arena. That means you should start looking for these features in the CM/ALM tools that you will use. They should not be expensive add-ons. They should not be limited to the high-priced tools. They should just be there, and the CM industry is rapidly evolving along these lines, even if a premium is demanded today.


In the end, securing your development assets requires that you use appropriate tools and investigate their architectures sufficiently prior to purchasing them. I've presented a number of capabilities and requirements here. There are many more I've neglected, some of which I'm unaware.


There's one more development asset that's important - your team. Providing tools, processes and automation that keep your staff productive and enable them to continue learning will help you to keep your most valuable of development assets. Perhaps this is a topic for a future article.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.