The One That Got Away

[article]
Summary:

Many testers have been involved in post-ship decisions about bugs that “got away” – bugs that escaped testing and found their way into customer’s hands. Often, these post-mortem discussions end up with finger pointing and threats, but with the right focus, these discussions are a wonderful opportunity for learning and growth.

My manager, Nick, knocked once on the door and entered without waiting for a response. All he said was, “Larry wants to talk to us.” Larry was Nick’s boss—the person running our organization—and from the expression on Nick’s face, I assumed Larry was angry.

“Angry” was an understatement. Larry slammed his door as soon as we entered and gritted his teeth. Then, in his best attempt to keep his volume from disturbing the workers in neighboring offices, said, “How in the world did you miss finding that bug?”

For many testers, this is a familiar scenario. The product ships, you have a party, and everyone on the team feels good about how things are going. The next thing you know, someone in a suit is slamming a door and asking what’s wrong with the test team.

I let Nick do the talking with Larry. He had a great way of calming Larry without admitting any specific mistakes or passing the blame to anyone else. I, on the other hand, wanted to punch Larry in the face and say, “I’m fairly certain that the test team didn’t put that bug—or any other bugs—into the product. You’re yelling at the wrong people!” I held my tongue, vowed to lead some efforts to investigate the root cause of the problem, and quietly walked back to my office.

Whose Fault Is It?
There was some truth to what I was thinking. Testers cannot be a safety net put in place to catch the bugs that fall through from the development team. The fact is that software ships with bugs, and there is no way to find every bug before shipping. Of course, testers should find the important bugs and use a risk-based approach to discover those issues, but some issues will inevitably find a way to sneak past the product team and into a customer’s hands.

You can’t really put all of the blame on the development team for putting the bug there in the first place, either. It’s an easy argument to make, but developers are just as human as testers (and everyone else, for that matter), and they will make mistakes, too.

Perhaps a better place for blame is on analysts or program managers who created the ambiguous requirements that ultimately manifested in the software. But, that tactic fails as well, both because of the human factor and the fact that bugs take a multitude of forms other than misinterpreted requirements.

The truth is that the responsibility for the bugs that the suits are fuming about falls on everyone. Sure, a bug in the wild exists because someone made a mistake, but trying to pin the blame for a bug on one person or one team is rarely worth the effort. The bug found its way into the product, and nobody discovered the bug before releasing to customers. The bug "got away," a customer found it, and now people are angry.

At this point, everyone on the team needs to take responsibility for determining what happened. The focus should be on discovery of what went wrong, not on placing blame or making excuses. Something undesirable happened, and it’s much better to use the experience for learning rather than an opportunity for venting anger. Team members worked together, did their best, shared their strategies and approaches with each other, and unanimously agreed the product was ready to ship. When (not if) a customer finds a bug, don’t look for blame. It's just something that happened and, more importantly, something to learn from. There's a reason or cause for the introduction of the bug, and there's a reason the test team didn't find it. The primary goal is to discover those reasons and learn from them so that similar types of bugs don’t "escape" in the future.

Eliminating the Blame
How do you get the whole team to own quality—or at least to eliminate blame from the equation? The first step is to eliminate the safety net culture. I certainly don't suggest that you stop testing, and I don't think you can accomplish this superficially by putting up a poster that reads, "Everyone Owns Quality," or some similar approach.

You can make progress by communicating and explaining what the test team is doing. Programmers, analysts, and whoever else is involved in making the product should do the same. Transparency and shared goals go a long way toward shifting the culture from one of blame to one of shared ownership. The more everyone on the team knows about each other’s approaches, the more likely it is that you’ll find and ultimately prevent blame.

Quality requires collaboration and sharing. Part of the reason Nick and I blew it with Larry was that we didn’t share enough about what we were doing. The bug occurred in an area that we didn’t test fully, but we did that on purpose. We made a risk-based decision, but we failed to communicate our strategy and approach to Larry and many others on our team.

Software Is Learning
Creating software is an opportunity to learn. Learning from each other, learning from the customer, and learning from our own mistakes should be primary goals of the software engineering process. Getting hung up on playing the blame game when something unexpected happens is a waste of time. Moving past the blame game and into a world of collaboration and knowledge acquisition is critical to transitioning from a culture of testing quality into a product to a culture that creates a better quality experience for everyone. Remember that next time you think about the one that got away.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.