Better Software Home > In This Issue > Featured Article
| || |
|Refactoring: Small Steps Guaranteed to Help You Clean Up Your Code
by C. Keith Ray
Software development on your project has slowed to a crawl. Instead of adding new features, you and your fellow programmers want to rewrite the existing code to make it easy to work with again, which of course frightens product marketing and your boss. How do we get into these messes? What can we do about it?
Many books and courses teach principles, rules, and guidelines of good design, but somehow much of the code we write still exhibits traits of poor design, and poor design slows us down.
Code smells are one such trait of poor design. I will illustrate several code smells, using real working code from the Mozilla project, to show you how you can clean up your design by using small refactoring steps. Then you will be able to identify some of the symptoms of poor design and understand how it can be incrementally improved.
I’ve chosen the Mozilla project because it is open source, has a large code base, and was written over an extended period of time by people of varying skill levels. It is working code, delivered to end-users as the Firefox Web browser, the Thunderbird email client, and various components, libraries, and other applications. I am not a Mozilla developer, so I am viewing the code with fresh eyes.
Instead of reproducing the actual C++ code from the Mozilla project, I will translate it into pseudo-code to make it more compact and to let us ignore various C++ idioms and
issues of syntax.
I’m going to start with the code example in Figure 1: the method GetTextHelper in the class nsAccessibleText. The first two smells immediately visible are Long Method and Long Parameter List (see the Code Smells sidebar for the names and descriptions of various code smells). The original C++ method has twenty-eight lines of text, not including blank lines. That’s not terribly long, but this example does exhibit some of the same design problems as other Long Methods.
We programmers tend to write Long Methods when we’re thinking procedurally—trying to get all those nagging details right, rather than moving our thoughts up to a higher level and using abstraction and composition to divide and conquer the problem. Sometimes a single programmer writes the Long Method; other times it results from a sequence of programmers adding one more thing (or one more bug fix) to what was originally a not-so-Long Method. A Long Method is usually guilty of several other code smells; eliminating them can often reduce the Long Method to something reasonable.
So, how long is too long? I say a method is too long when it lacks cohesion and is hard to fully test. Meilir Page-Jones and Tom DeMarco define cohesion as a measure of how closely related parts of a module are to each other. Robert C. Martin calls cohesion the “Single-Responsibility Principle,” which states “a class should only have one reason to change.” (This principle also applies to methods, modules, or packages.) I’ve seen some cohesive image-processing methods that were fifty to one hundred lines of code. Outside of that specialized field, though, I don’t think I’ve ever seen a truly cohesive method longer than thirty lines. In general, smaller is better.
Figure 1 is the kind of Long Method that I call the “big series of detailed actions.” It often offers clues as to how it can be broken up. Such a method usually contains chunks of code following a comment explaining what that code is meant to do. We’re going to extract these chunks of code into new methods and use their comments to name them.
First, we focus on the comment “backup old settings” and the block that follows it. We’re going to do the Extract Method refactoring (see the StickyNotes for more on refactoring). Create a method in the same class and name it BackupOldSettings, which is what the comment says. Copy into it the block of code following the comment. Since this block of code refers to local variables and parameters, we need to pass those variables in and out of it. The new method looks like this:
Once that compiles, remove the code you copied from the GetTextHelper method and replace it with the call to this new method. Since the extracted method name says the same thing, you can eliminate the comment. The new line is:
This is just a small change, but we would want to test the code to make sure we didn’t break anything. Refactoring isn’t safe without automated tests. In your own projects, run your automated tests after each refactoring. If all your tests are still passing, then pause to briefly enjoy having done a successful refactoring. Otherwise, undo and try again.
Next we focus on the comment “Turn Off Display and Caret,” doing the same Extract Method refactoring we did previously. Because I’ve omitted the middle part of this Long Method, the last portion we’ll extract is the part following the comment “Restore Old Settings.” Again, we perform the Extract Method refactoring. The resulting code is shown in Figure 2.
Now each of these extracted methods is small and easily testable in isolation. And each one identifies its purpose from the name we took from the comments. The behavior is the same, and the design is a little better. We can observe that each of these extracted methods obviously exhibits the code smell Feature Envy—they are more concerned with another class than they are with the class they are in. We could just live with that smell for a while, or we can do Move Method refactorings to put the code into the class to which the methods refer, nsISelectionController.
Let’s do the Move Method refactoring, first by copying these methods into the destination class and changing the “selCon” argument to “this” (see Figure 3). If there were a lot of existing code referring to the original methods, we would preserve the original methods but re-implement them
to delegate their work to the new methods in class nsISelectionController. Since we just created them a few minutes ago, we can delete the old methods and alter GetTextHelper to call the new methods in class nsISelectionController.
Notice how the methods moved into class nsISelectionControllernow have shorter parameter lists. That’s because in my pseudo-code, and in languages like C++ and Java, the “this” pseudo-variable is not explicitly declared. These methods are now living comfortably in their “home” class rather than in some stranger’s class.
The parameters for BackupOldSettingsand RestoreOldSettings are being used to save and restore some data belonging to the class nsISelectionController. Callers of these functions should not have to care about how many and what kinds of variables are needed. This is a lack of encapsulation—a code smell called Inappropriate Intimacy. We can refactor to an implementation of the design pattern named Memento, which deals with this type of thing by creating a separate storage class to hold whatever data is necessary to restore an object to a saved state (see the StickyNotes for more on this and other design patterns). The refactored pseudo-code might look like Figure 4.
In languages like C++, the Memento class can be an “inner” class of nsISelectionController, indicating their close relationship, although I did not choose to do so in the pseudo-code. Taking advantage of C++’s “resource acquisition is initialization” idiom, the Memento class or a helper class can be defined to get the data from nsISelectionControllerin its constructor and restore that data in its destructor (see the StickyNotes for an example). Used in the proper way, the calling- code’s explicit “restore old settings” statement would go away—replaced by an implicit destructor call—and the restoration would be automatic.
Earlier I mentioned the code smell Long Parameter List. There can be a lot of reasons for Long Parameter Lists, and in fact, the length may be caused by various other code smells. We noted before that a method in its proper home class has a shorter parameter list because of the implicit “this” pseudo-variable. However, in the case of the method GetTextHelper, I notice that two of the parameters, aStartOffsetand aEndOffset, look like they may often be passed around together, which is the Data Clump smell. We address this by creating a new class:
and then refactor the parameter list to use it instead of the two separate variables:
Then we can go through as much of the existing code as we desire and change occurrences of two variables like these into an instance of the new class wherever we think it is appropriate. Now that we have a class to put methods into, we can look for smelly code involving those variables and move that code into new methods in this class.
Suppose there was some code that computed a length from the two variables:
After refactoring into a class, that code would look like this:
If we do Extract Method and Move Method on this length computation, the new method in the IntegerRange class would look something like:
And the calling code would become:
Look around for other places that compute the length this way—that’s the Duplicated Code smell. Change that code to call this new method. If we keep a sharp eye out for code that accesses the variables of this new class, we can gradually refactor this class into something full-fledged and useful, eliminating duplicated code in the process.
It may look like all of these refactorings are small changes, and they are. They are intentionally small so that performing them is as error-free as possible. Refactorings can build on each other or be independent, depending on what code and code smells are being addressed. Attempting to make a large change all at once tends to break your code and introduce errors. Performing these small refactorings is safer, especially if you test the code after each change. You can stop after any small refactoring in order to do something different, such as write new code or fix bugs, and resume making improvements later.
Doing small refactorings is worth your time because, after you’ve done enough of them, the large refactorings that you originally wanted to do (“I want to replace this whole module”) become easier. Once you’ve reduced a large number of dependencies, eliminated a lot of duplicate code, and otherwise made your classes cleaner and more cohesive, the big classes and large methods will be smaller, and the code will no longer be so entangled. Replacing that module might have been difficult before because there were many duplicated blocks depending on its private parts. After extensive refactoring, the code will be more self-contained and easier to replace.
We need not only the positive rules of design—like the definition of cohesion—but also negative rules like the code smells. Think of it in terms of a game: The rules of basketball describe not only the winning conditions and permitted actions but also what to do if a player goes out of bounds and what actions constitute fouls.
Programmers need to be sensitive to symptoms of bad design. When we identify smelly code, which slows us down, we should call a foul and figure out which refactorings can fix it. If we are not sensitive to code smells, we will continue to create code that is hard to understand, maintain, and test, and is prone to defects and bugs.
C. Keith Ray is a software engineer with experience in multiple platforms and languages. Keith currently works with a team of Macintosh developers at Intuit, Inc. and writes a blog at