Getting Empirical about Refactoring


Often when we refactor, we look at local areas of code. If we take a wider view, using information from our version control systems, we can get a better sense of the effects of our refactoring efforts.

Refactoring has been around for ages, and most programmers at least pay lip service to it. In my experience, people approach refactoring in one of two ways. They either build it into their day-to-day work as an integral practice, or they talk about it as an avalanche task—something that they do in very large chunks every other sprint or so.

If you’ve studied up on refactoring, you know that making it part of your daily practice is the way to go. You look at a piece of code you are trying to understand or are about to change, and if it is not in a good state, you refactor. Over time, your code base gets better, except for the parts that don’t.

Wait. Why wouldn’t your code get uniformly better over time when you refactor what you touch? The fact is that we don’t touch code in our code bases with uniform frequency. Some classes get touched about once every month or so, and some—well, you wrote them a long time ago and there’s a good chance that you’ll never modify them again. The net effect of this is that some parts of our code get better over time and other parts don’t. This isn’t as bad as it sounds. We leave the code that we touch in a better state over time, and we’ll find ourselves visiting fewer and fewer bad areas of code. There may be messy areas remaining in the code base, but the fact that we don’t work in them reduces their impact on us.

In short, “refactor as you go” is a heuristic that aims our effort toward the code that gets in our way when we are programming. If we apply it diligently, we end up working in better code more often.

The only problem with this chain of reasoning is that it’s hard to be diligent. Some code is downright scary. It’s easy to think that a particular long method doesn’t matter much or that the effort to refactor it won’t be paid back, but we don’t have to guess. We can gather information that helps us understand the impact of our refactoring decisions.

Let’s take a look at some of the information we can acquire from our version control systems. Figure 1 is a graph of the number of commits for every file in a particular project’s code repository. The files are sorted in order of increasing commits.

Figure 1

In this code base, we can see that there are some files that are changed extremely frequently, but the vast majority have only a few changes. This isn’t atypical. Most of the code bases I’ve created this graph for have roughly the same shape.

Looking at the figure, it seems that if we are going to spend time refactoring in our code base, we should concentrate on the files on the right side of the graph. Unfortunately, this isn’t the complete picture. Files can end up on the right for a variety of reasons. In some applications, there are some code files that are nearly configurational—they contain code for frequently changed settings or points of variation. Files containing factory classes are often in this category. They may be changed several hundred times over the life of the project, and, technically, there is nothing wrong with them. They aren’t the sorts of classes that inspire fear in our hearts.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.