Using Working Folders in Version Control

article

December 1, 2008

Summary

The repository is the official archive of a project's work products. We treat our repository with great respect. In contrast, developer often treat their working folder with very little regard. It exists for the purpose of being abused. The working folder starts out worthless, nothing more than a copy of the repository. If it is destroyed, we have lost nothing, so we run risky experiments which endanger its life. In this excerpt from his online book, "Source Control HOWTO," Eric Sink describes explores common "best practices" for using working folders.

Visual Studio calls it a sandbox. Subversion calls it a working directory. Vault calls it a working folder. By any of these names, a working folder is a directory hierarchy on the developer's client machine with a copy of the contents of a repository folder. The very basic workflow of using source control involves three steps:

Update the working folder so that it exactly matches the latest contents of the repository.
Make some changes to the working folder.
Check-in (or commit) those changes to the repository.

The repository is the official archive of our work. We treat our repository with great respect. We are extremely careful about what gets checked in. We buy backup disks and RAID arrays and air conditioners and whatever it takes to make sure our precious repository is always comfortable and happy.

Best Practice: Don't let Your Working Folder Become too Valuable

Check in your work to the repository as often as you can without breaking the build.

In contrast, we treat our working folder with very little regard. It exists for the purpose of being abused. Our working folder starts out worthless, nothing more than a copy of the repository. If it is destroyed, we have lost nothing, so we run risky experiments which endanger its life. We attempt code changes which we are not sure will ever work. Sometimes the contents of our working folder won't even compile, much less pass the test suite. Sometimes our code changes turn out to be a Really Bad Idea, so we simply discard the entire working folder and get a new one.

But if our code changes turn out to be useful, things change in a very big way. Our working folder suddenly has value. In fact, it is quite precious. The only copy of our most recent efforts is sitting on a crappy, laptop-grade hard disk which gets physically moved four times a day and never gets backed up. The stress of this situation is almost intolerable. We want to get those changes checked in to the repository as quickly as possible.

Once we do, we breathe a sigh of relief. Our working folder has once again become worthless, as it should be.

Hidden State Information

Once again I need to spend some time explaining grungy details of how SCM tools work. I don't want to repeat the apology I used in the last chapter, so the following line of "code" should suffice:

Response.Write(previousChapter.Section["Cars and Clocks"]);

Best Practice: Use Non-Working Folders When you are not Working

SCM tools need this "hidden state information" so it can efficiently keep track of things as you make changes to your working folder. However, sometimes you want to retrieve files from the repository with no plan of making changes to them. For example, if you are retrieving files to make a source tarball, or for the purpose of doing an automated build, you don't really need the hidden state information at all.

Your SCM tool probably has a way to retrieve things "plain," without writing the hidden state information anywhere. I call this a "non-working folder." In Vault, this is done automatically whenever you retrieve files to a destination which is not configured as the working folder, although I sometimes wish we had made this functionality a completely separate command.

Let's suppose I have a brand new working folder. In other words, I started with nothing at all and I retrieved the latest versions from the repository. At this moment, my new working folder is completely in sync with the contents of the repository. But that condition is not likely to last for long. I will be making changes to some of the files in my working folder, so it will be "newer" than the repository. Other developers may be checking in their changes to the repository, thus making my working folder "out of date." My working folder is going to be new and old at the same time. Things are going to get confusing. The SCM tool is responsible for keeping track of everything. In fact, it must keep track of the state of each file individually.

For housekeeping purposes, the SCM tool usually keeps a bit of extra information on the client side. When a file is retrieved, the SCM client stores its contents in the corresponding working file, but it also records certain information for later. Examples:

Your SCM tool may record the timestamp on the working file, so that it can later detect if you have modified it.
It may record the version number of the repository file that was retrieved, so that it may later know the starting point from which you began to make your changes.
It may even tuck away a complete copy of the file that was retrieved, so that it can show you a diff without accessing the server.

I call this information "hidden state information." Its exact location depends on which SCM tool you are using. Subversion hides it in invisible subdirectories in your working directory. Vault can work similarly, but by default it stores hidden state information in the current user's "Application Data" directory.

Working File States

Because of the changes happening on both the client and the server, a working file can be in one of several possible states. SCM tools typically have some way of displaying the state of each file to the user. Vault shows file states in the main window. CVS shows them in response to the 'cvs status' command.

The table below shows the possible states for a working file. The column on the left shows my particular name for each of these states, which through no coincidence is the name that Vault uses. The column on the far right shows the name shown by the 'cvs status' command. However, the terminology doesn't really matter. One way or another, your SCM tool is probably keeping track of all these things and can tell you the state of any file in your working folder hierarchy.

Refresh

State Name	Has the working file been modified?	Does the repository have a newer version than the last one retrieved?	Remarks	'cvs status'
None	No	No	The working file matches the latest version in the repository.	Up-to-date
Old	No	Yes		Needs Patch
Edited	Yes	No		Locally Modified
Needs Merge	Yes	Yes		Needs Merge
Missing	N/A	N/A	The working file does not exist.	Needs Checkout
Renegade	Yes	No	You have modified a file without first checking it out.	N/A
Unknown	No	No	There is a working file, but the SCM tool has no hidden state information about it.	Unknown

In order to keep all this file status information current, the SCM client must have ways of staying up to date with everything that is happening. Whenever something changes in the working folders or in the repository, the SCM client wants to know.

Changes in the working folders on the client side are relatively easy. The SCM client can quickly scan files in the working folders to determine what has changed. On some operating systems, the client can register to be notified of changes to any file.

Notification of changes on the server can be a bit trickier. The Vault client periodically queries the server to ask for the latest version of the repository tree structure. Most of the time, the server will simply respond that "nothing has changed." However, when something has in fact changed, the client receives a list of things which have changed since the last time that client asked for the tree structure.

For example, let's assume Laura retrieves the tree structure and is informed that foo.cpp is at version 7. Later, Wilbur checks in a change to foo.cpp and creates version 8. The next time Laura's Vault client performs a refresh, it will ask the server if there is anything new. The server will send down a list, informing her client that foo.cpp is now at version 8. The actual bits for foo.cpp will not be sent until Laura specifically asks for them. For now, we just want the client to have enough information so that it can inform Laura that her copy of foo.cpp is now "Old."

Operations that Involve a Working Folder

OK, let's go back to speaking a bit more about practical matters. In terms of actually usage, most interaction with your SCM tool happens in and around your working folder. The following operations are the basic things I can do to a working folder:

Make changes	This is the whole point.
Review changes	Show me the changes I have made to my working folder so far.
Undo changes	Some of my changes didn't work out the way I planned. Undo them, restoring my working folder back to the way it was when I started.
Update	The repository has changes which I want to be included in my working folder.
Commit changes	I'm ready to send my changes to the repository and make them permanent.

In the following sections, I will cover each of these operations in a bit more detail.

Make the Changes

The primary thing you do to a working folder is make changes to it. In an idealized world, it would be really nice if the SCM tool didn't have to be involved at all. The developer would simply work, making all kinds of changes to the working folder while the SCM tool eavesdrops, keeping an accurate list of every change that has been made.

Unfortunately, this perfect world isn't quite available. Most operations on a working folder cannot be automatically detected by the SCM client. They must be explicitly indicated by the user. Examples:

It would be unwise for the SCM client to notice that a file is "Missing" and automatically assume it should be deleted from the repository.
Automatically inferring an "Add" operation is similarly unsafe. We don't want our SCM tool automatically adding any file which happens to show up in our working folder.
Rename and move operations also cannot be reliably divined by mere observation of the result. If I rename foo.cpp to bar.cpp, how can my SCM client know what really happened? As far as it can tell, I might have deleted foo.cpp and added bar.cpp as a new file.

All of these so-called "folder-level" operations require the user to explicitly give a command to the SCM tool. The resulting operation is added to the pending change set, which is the list of all changes that are waiting to be committed to the repository.

However, it just so happens that in the most common case, our "eavesdropping" ideal is available. Developers who use the edit-merge-commit model typically do not issue any explicit command telling the SCM tool of their intention to edit a file. The files in their working folder are left in a writable state, so they simply open their text editor or their IDE and begin making changes. At the appropriate time, the SCM tool will notice the change and add that file to the pending change set.

Users who prefer "checkout-edit-checkin" actually have a somewhat more consistent rule for their work. The SCM tool must be explicitly informed of all changes to the working folder. All files in their working folder are usually marked read-only. The SCM tool's Checkout command not only informs the server of the checkout request, but it also flips the bit on the working file to make it writable.
Review Changes

One of the most important features provided by a working folder is the ability to review all of the changes I have made. For SCM tools that do keep track of a pending change set (Vault, Perforce, Subversion), this is the place to start. The following screen dump shows the pending change set pane from the Vault client, which is showing me that I have currently made two changes in my working folder:

The pending change set view shows all kinds of changes, including adds, deletes, renames, moves, and modified files. It is helpful to keep an eye on the pending change set as I work, verifying that I have not forgotten anything.

However, for the case of a modified file, this visual display only shows me which files have changed. To really review my changes, I need to actually look inside the modified files. For this, I invoke a diff tool. The following screen dump is from a popular Windows diff tool called Beyond Compare:

This picture is fairly typical of the visual diff tool genre, showing both files side-by-side and highlighting the parts that are different. There are quite a few tools like this. The following screen dump is from the visual diff tool which is provided with Vault:

Best Practice: Run Diff Just Before you Checkin, Every Time

Never checkin your changes without giving them a quick review in some sort of a diff tool.

The left panel shows version 21 of sgdmgui_props.cpp, which is the current version in the repository. The right panel shows my working file. The colored regions show exactly what has changed:

On line 33 I changed the type of this function from long to short.
At line 35 I inserted a one-line comment.

Note that SourceGear's diff tool shows inserted lines by drawing lines in the center gap to indicate exactly where the insertion occurs. In contrast, Beyond Compare is showing a dead region on the left side across from the inserted line on the right. This particular issue is a matter of personal preference. The latter approach does have the benefit that identical lines are always across from each other.

Both of these tools do a nice job on the modification to line 33, showing exactly which part of the line was changed. Most of the recent visual diff tools support this ability to highlight intraline differences.

Visual diff tools are indispensable. They give me a way to quickly review exactly what has changed. I strongly recommend you make a habit of reviewing all of your changes just before you checkin. You can catch a lot of silly mistakes by taking the time to be sure that your changes look the way you think they look.

Undo Changes

Sometimes I make changes which I simply don't intend to keep. Perhaps I tried to fix a bug and discovered that my fix introduced five new bugs that are worse than the one I started with. Or perhaps I just changed my mind. In any case, a very nice feature of a working folder is the ability to undo.

In the case of a folder-level operation, perhaps the Undo command should actually be called "Nevermind." After all, the operation is pending. It hasn't happened yet. I'm not really saying that I want to Undo something which has already happened. Rather, I am just saying that I no longer want to do something that I previously said I did.

For example, if I tell the Vault client to delete a file, the file isn't really deleted until a commit that change to the repository. In the meantime, it is merely waiting around in my pending change set. If I then tell the Vault client to Undo this operation, the only thing that actually has to happen is to remove it from my pending change set.

Best Practice: Be Careful with Undo

When you tell your SCM client to undo the changes you have made to a file, those changes will be lost. If your working folder has become valuable, be careful with it.

In the case of a modified file, the Undo command simply overwrites the working file with the "baseline" version, the one that I last retrieved. Since Vault has been keeping a copy of this baseline version, it merely needs to copy this baseline file from its place in the hidden state information over the working file.

For users who use the checkout-edit-checkin style of development, closely related here is the need to undo a checkout. This is essentially similar to undoing the changes in a file, but involves the extra step of informing the server that I no longer want the file to be checked out.

Digression: Your Skillet is not a Working Folder

Source control tools have been a daily part of my life for well over a decade. I can't imagine doing software development without them. In fact, I have developed habits that occasionally threaten my mental health. Things would be so much easier if the concept of a working folder were available in other areas of life:

"Hmmm. I can't remember which of these pool chemicals I have already done. Luckily, I can just diff against the version of the pool water from an hour ago and see exactly what changes I have made."
"Boy am I glad I remembered to set the read-only bit on my front lawn to remind me that I'm not supposed to cut the grass until a week after the fertilizer was applied."
"No worries -- if I accidentally put too much pepper on this chicken, I can just revert to the latest version in the repository."

Unfortunately, SCM tools are unique. When I make a mistake in my woodshop, I can't undo it. Only in software development do I have the luxury of a working folder. It's a place where I can work without constantly worrying about making a mistake. It's a place where I can work without having to be too careful. It's a place where I can experiment with ideas that may not work out. I wish I had working folders everywhere.

Update the Working Folder

Ten milliseconds after I retrieve a fresh working folder, it might be out of date. An SCM repository is a busy hub of activity. New stuff arrives regularly as team members finish tasks and checkin their work.

I don't like to let my working folder get too far behind the current state of the repository. SCM tools typically allow the user to invoke a diff tool to compare two repository versions of a file. When I am working on a feature, I periodically like to review the recent changes in the repository. Unless those changes look likely to disrupt my own work, I usually proceed to retrieve the latest versions of things so that my working folder stays up to date.

In CVS, the command to update a working folder is [rather conveniently] called 'update.' In Vault, this operation is done with the Get Latest Version command. The screen dump below is the corresponding dialog box:

Best Practice: Don't get too Far Behind

Update your working folder as often as you can. I want to update my working folder to contain all of the changes available on the server, so I have invoked the Get Latest Version operation starting at the very top folder of my repository. The Recursive checkbox in the dialog above indicates that this operation will recursively apply to every subfolder.

Note that this dialog box gives me a few choices for how I may want to handle situations where a change has happened on both the client and the server. Let us suppose for a moment that I am not using exclusive checkouts and that somebody else has also modified sgdmgui_props.cpp. In this case, I have three choices available when I want to update my working folder:

Overwrite my working file. This effect here is similar to an Undo. My changes will be lost. Use with care.
Attempt automatic merge. The Vault client will attempt to construct a file which contains my changes and the changes which were made on the server. If the automerge succeeds, my working file will end up in the "Edited" status. If the automerge fails, the status of my working file will be "Needs Merge", and the Vault client will nag and pester me until I resolve the situation.
Do not overwrite/Merge later. This option leaves my working file untouched. However, the status of the file will change to "Needs Merge". Vault will not allow me to checkin my changes until I affirm that I have done the right thing and merged in the changes from the repository.

Note also that the "Prompt for modified files" checkbox allows me to specify that I want the Vault client to allow me to choose between these options for every file that ends up in this situation.

As you can see, the Get Latest Version dialog box includes a few other options which I won't describe in detail here. Other SCM tools have similar abilities, although the user interface may be very different. In any case, it's a good idea to update your working folder as often as you can.

Commit Changes

In most situations, I eventually decide that my changes are Good and should be sent back to the repository so they can become a permanent part of the history of my project. In Vault, Subversion and CVS, the command is called Commit. The following screen dump shows the Commit dialog box from Vault:

Note that the listbox at the top contains all of the items in my pending change set. In this particular example, I only have two changes, but this listbox typically has a scrollbar and contains lots of items. I can review all of the operations and choose exactly which ones I want to commit to the repository. It is possible that I may want to checkin only some of my currently pending changes. (Perforce has a nifty solution to this problem. The user can have multiple pending change sets, so that changes can be logically grouped together even as they are waiting to be checked in.)

The "Change Set Comment" textbox offers a place for me to type an explanation of what I changed and why I did it. Please note that this textbox has a scrollbar, encouraging you to type as much text as necessary to give a full explanation of the problem. In my opinion, checkin comments are more important than the comments in the actual code.

When I click OK, all of the selected items will be sent to the server to be committed to the repository. Since Vault supports atomic checkin transactions, I know that my changes will succeed or fail as a united group. It is not possible for the repository to end up in a state where only some of these changes made it.

#region CARS_AND_CLOCKS

Remember the discussion in the last edition about binary file deltas? This same technology is also used for checkin operations. When Vault sends a modified version of a file up to the server, it actually sends only the bytes which have changed, using the same VCDiff format which is used to make repository storage more efficient.

The reason this is possible is because it has kept a copy of the baseline file in the hidden state information. The Vault client simply runs the VCDiff algorithm to construct the difference between this baseline file and the current working file. So in the case of my running example, the Vault client will send three pieces of information:

The binary delta. Since the pending change set pane shows that my working file is 40 bytes larger than the baseline where I started, the binary delta is going to be somewhere in the vicinity of 40 bytes long, perhaps with a few extra bytes for overhead.
The fact that this binary delta was computed against version 21 of the file. Since version 21 is known and exists on both the client and the server, the SCM server can simply apply the binary delta to its own copy of version 21 to reconstruct an exact copy of the contents of my working file.
The CRC checksum of the original working file. When the server reconstructs its copy of the working file, the CRC will be compared to ensure that nothing was corrupted during transit. The file that is stored in the repository will be exactly the same as the working file. No corruption, no surprises.

Whenever possible, Vault uses binary file deltas "over the wire" in both directions, from client to server as well as from server to client. In this example, the entire file is only 3,762 bytes, so the savings in network bandwidth isn't all that significant. However, for larger files, the increase in network performance for offsite users can be quite dramatic.

This capability of using binary file deltas between client and server is supported by some other SCM tools as well, including (I believe) Subversion and Perforce.

#endregion

When the checkin has completed successfully, if I am working in "checkout-edit-checkin" mode, the SCM tool will flip the read-only bit on my working files to prevent me from accidentally making changes without informing the server of my intentions.

Having completed my checkin, the cycle is completed. My working folder is once again worthless, since my changes are a permanent part of the repository. I am ready to start again on my next development task.

Eric Sink is a software developer at SourceGear, which makes source control (aka "version control," "SCM") tools for Windows developers. He founded the AbiWord project and was responsible for much of the original design and implementation. Prior to SourceGear, he was the Project Lead for the browser team at Spyglass (now OpenTV) who built the original versions of the browser you now know as "Internet Explorer." Eric received his B.S. in Computer Science from the University of Illinois at Urbana-Champaign. The title on Eric's business card says "Software Craftsman." You can Eric at [email protected]. This series of articles from Eric Sink are part of his online book called Source Control HOWTO, a best practices guide on source control, version control, and configuration management. You can find it online at http://software.ericsink.com/scm/source_control.html.

Topics:

configuration management version control

About The Author

TechWell Contributor

The opinions and positions expressed within these guest posts are those of the author alone and do not represent those of the TechWell Community Sites. Guest authors represent that they have the right to distribute this content and that such content is not violating the legal rights of others. If you would like to contribute content to a TechWell Community Site, email [email protected].