You don't wait until the day before a software release to test the product. Testing software is a complex process, involving systematic investigation and sustained observation. In this week's column, James Bach argues that evaluating testers is similarly complex. And it shouldn't be put off until the night before the tester's performance review.
I was at the Sixth Software Test Managers Roundtable meeting recently, discussing ways to measure software testers. This is a difficult problem. Count bug reports? Nearly meaningless. Even if it meant something, it would have terrible side effects, once testers suspect they are being measured in that way. It's easy to increase the number of bugs you report without increasing the quality of your work. Count test cases? Voila, you'll get more test cases, but you won't necessarily get more or better testing. What incentive would there be to do testing that isn't easily reduced to test cases, if only test cases are rewarded? Why create complex test cases, when it's easier to create a large number of simple ones?
Partway through the meeting, it dawned on me that measuring testers is like measuring software. We can test for problems or experience what the product can do, but no one knows how to quantify the quality of software, in all its many dimensions, in a meaningful way. Even so, that doesn't stop us from making a meaningful assessment of software quality. Maybe we can apply the same ideas to assessing the quality of testers.
Here are some ideas about that:
To test something, I have to know something about what it can do.
I used to think of testers in terms of a set of requirements that all testers should meet. But then I discovered I was missing out on other things testers might have to offer, while blaming them for not meeting my Apollonian ideal of a software testing professional. These days, when I watch testers and coach them, I look for any special talents or ambitions they may have, and I think about how the project could benefit from them. In other words, I don't have highly specific requirements. I use general requirements that are rooted in the mission the team must fulfill, and take into account the talents already present on the team. If I already have an automation expert, I may not need another one. If I already have a great bug writer who can review and edit the work of the others, I might not need everyone to be great at writing up bugs.
"Expected results" are not always easy to define.
Let's say two testers test the same thing, and they both find the same two bugs. One tester does that work in half the time of the other tester. Who is the better tester? Without more information, I couldn't say. Maybe the tester who took longer was doing more careful testing. Or maybe the tester who finished sooner was more productive. Even if I sit there and observe each one, it may not be easy to tell which is the better tester. I'm not sure what my expectation should be. What I do instead is to make my observations and do my best to make sense of them, weaving them into a coherent picture of how each tester performs. "Making sense of observations" is a much richer concept (and, I think, more useful) than "comparing to expected results."
When I find a problem, I suspend judgment and investigate before making a report.
When I see a product fail, especially if it's a dramatic failure, I've learned to pause and consider my data. Is it reliable? Might there be problems in the test platform or setup that could cause something that looks like a product failure, even though it isn't? When the product is a tester, this pause to consider is even more important, because the "product" is its own programmer. I may see