When errors are not detected during testing, somewhere down the line someone has to take responsibility. In this column, Linda Hayes shows you when and how to do so—and you might even be able to turn the situation to your advantage.
If you manage a test group, you are going to fail sooner or later. By fail I mean that an error is going to escape your most diligent efforts and wreak havoc in production. I know this because I have never met a test manager who had all the time, resources, and necessary information, and I have never seen a perfect software release. So the question is not if it will happen, but what you can do when it happens.
Picture yourself. You are in a meeting with your manager who wants to know—how did you miss this? As I see it, you have three options. You can tell them:
I didn't miss it, someone else did. You didn't even know you needed to test it because you didn't get thorough requirements. Or, it was an obscure coding issue (the maximum size of an array, for example) that only the developer would know about and so it should have been caught in a unit test. Or, it was caused by the production environment itself, through a faulty configuration or installation. Or…you get the idea.
The advantage of this approach is that these things are probably true, and you can deflect the blame to where it really belongs so that improvements can be made. The disadvantage is that you are going to end up unpopular with whoever ends up receiving the blame. No one likes a snitch.
I did miss it, but it wasn't my fault. You would have caught it, but the software was so late that you had to cut your testing short. Or the test environment was so unstable that you couldn't run all of your tests. Or your headcount request was slashed so that you were too understaffed to do a thorough job.
Again, the upside for this response is that your reasons are likely to be true, and management needs to be realistic about what you are up against. The downside is that you are going to be cast as a whiner.
I did miss it, but it won't happen again. You accept responsibility for the quality of the product and you will take steps to ensure that it does not happen again. These steps include reviewing your test plan with the users to be sure all requirements are covered, for example, or reviewing the unit test plans to be sure you understand what is being tested in development. You might also propose that you either get crunch time reinforcements from other departments when the schedule is tight, or that you get the discretion to cut less critical tests from your plan in order to make up for lost time.
The obvious drawback to this option is that you take the blame, which is somewhat risky. You don't want to become a doormat. On the other hand, you are perceived as someone who takes responsibility, and the best part is that it allows you to make the same points as the first two options but in a positive way. You not only identify the issues without casting blame, but you also suggest solutions.Two Examples
Here's an example from my experience. We were testing an application that supported multiple back-end databases and server operating systems. There were too many possible combinations, so we chose the ones that represented the majority of customers. The one that failed, of course, was for a brand new customer who had signed a huge contract.
The Sales VP was livid. Never mind that this was a new platform and that QA did not even have access to it, only development did. Our request for buying our own server had been denied by the CFO and our plea for a partition on development's server had been denied by the R&D VP. We literally could not have tested it even if we had tried. So, we had a veritable shooting gallery of villains, and behind closed doors I made sure that the Sales VP knew it.
But in public, at the cross-functional meeting, we mentioned none of this. What we said was that we understood the importance of testing configurations and truly regretted the problem. However, we were also aware of the need to stay within schedule and budget, and testing each configuration took about three days. So we proposed to publish the configurations we planned to test and those that we did not, and elicit input from the team to be sure we were focusing on the right ones.
The beauty of this approach was that it drew attention to the fact that we could not possibly test all the configurations. When we distributed the matrix of every potential combination of database, server, and operating system version—there were hundreds—it blew everyone away. Suddenly they were more sympathetic. Furthermore, it imposed reality: if someone insisted that we add a new configuration, we agreed to, but it meant we would either have to take another off, or add three days to the schedule, or add more resources. Their choice.
More recently, a colleague related a similar strategy. A problem related to interfaces had caused serious problems in another department, and so his manager was especially irritated at being called on the carpet by a peer. He humbly took responsibility and told his boss that he would work with the other department to be sure it never happened again.
So he printed out his test case inventory of more than 3,000 different scenarios, then scheduled a meeting with the manager of the other department. At the meeting he produced his test cases (several heavy binders), flipped through them to the section dealing with the relevant interfaces, explained carefully what was being tested, and asked for input on how to improve the process.
The other manager was impressed and somewhat intimidated, and eventually confessed that the problem was really on his end for not testing the incoming files. My friend now had a new fan and supporter because he did not run back to his boss claiming exoneration. He let the other manager tell his boss that he was doing a great job and that they had worked through the issue. Everyone saved face.
The lesson here is to remember that people are more important than problems. You can make your point without pointing a finger. And remember that the real challenge is not explaining the last issue, it is preventing the next one.