Getting Empirical about Refactoring


To narrow down our refactoring candidates even further, we can take complexity into account. In figure 2, I’ve graphed the complexity for each file on the y-axis. The x-axis continues to show churn—the number of times a file has been modified.

Figure 2

These diagrams give us quite a bit of information. The upper right quadrant is particularly important. These files have a high degree of complexity, and they change quite frequently. There are a number of reasons why this can happen. The one to look out for, though, is something I call runaway conditionals. Sometimes a class becomes so complex that refactoring seems too difficult. Developers hack if-then-elses into if-then-elses, and the rat’s nest grows. These classes are particularly ripe for a refactoring investment.

The other areas of the graph can give us some interesting indicators about the design style of the team. In healthy code, most of the files are in the lower left quadrant. I call this the healthy closure region. Abstractions here have low complexity and don’t change much. The upper left is what I call the cowboy region. This is complex code that sprang from someone’s head and didn’t seem to grow incrementally.

The last region—the bottom right—is very interesting. I call it the fertile ground. It can consist of files that are somewhat configurational, but often there are also files that act as incubators for new abstractions. People add code, it grows, and then they factor outward, extracting new classes. The files churn frequently, but their complexity remains low.

It’s rather easy to create these graphs. The project I used here is an open source project hosted on github. I wrote a small script to grep the log of the repository for all file adds and modification. Then, I sorted the list and use the uniq command to get the number of changes for each file. If you have a tool handy that computes some complexity metric at the file level, you can aggregate data for the other axis. I used the free tool SourceMonitor.

If we refactor as we make changes to our code, we end up working in progressively better code. Sometimes, however, it’s nice to take a high-level view of a code base so that we can discover where the dragons are. I’ve been finding that this churn-vs.-complexity view helps me find good refactoring candidates and also gives me a good snapshot view of the design, commit, and refactoring styles of a team.

Quite often, metrics views of code are restricted to static measures of code quality. Adding the time dimension through version-control history gives us a broader view. We can use that view to guide our refactoring decisions.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.