Exposing False Confidence in Your Tests

[article]
Summary:
Testing can't tell you what's wrong with your code. It can only show what is not wrong with it. And though we cannot possibly conceive everything that might be wrong, it's important to stray from the "happy path." We need test cases that present bogus inputs and assert that they raise exceptions. That's how we can replace our false confidence with true assurance.

Many moons ago I wrote a compiler for a little special-purpose language in C++, but in time it needed to be redone in C# and we had to mix in some functionality from another utility. This was the perfect situation. I had crystal-clear, rock-solid requirements. The old system had unit tests I could easily port and baseline data covering years' worth of production quirks. This wasn't a one-to-one port because I was replacing my hand-built lexical analyzer with a public domain library.

Because I had drunk the test-driven development Kool-Aid, I embarked upon designing and implementing this project with this question asked of every module therein: "How can I prove this is working?"

I wrote and ported unit tests before a single line of production code was written. TDD was new to me at the time, and with religious zeal I created a comprehensive suite of tests for not just unit tests, but also acceptance tests that proved every assertion in my statement of work. Soon the dashboard of my test framework showed solid green.

It felt good, and in full gloat mode, I thought, "Nothing could possibly be wrong."

Then I unleashed my stunning work of staggering genius upon the world. What do you suppose happened next?

Some dirty dog in testing found a command-line argument that didn't work—and not something trivial and obscure, but something really basic and obvious. I went into denial. No, it can't be my code. Look, I've got this massive suite of passing tests that prove everything is perfect. Ultimately, that evil person ran the code in front of me and I could not deny the evidence right before my eyes.

How could this be? I had all those tests. They were all green.

My "comprehensive" suite of tests wasn't as comprehensive as I thought it was. In this case, the command-line parser was correct, with unit tests verifying it, and the code to be invoked was correct, with unit tests verifying that, too. The trouble was a misspelling in the connection between the two. Oops.

We all have blind spots. Never do blind spots hurt more than when they correspond to test cases we fail to consider.

The lesson here is that testing could never tell me what was wrong with my code. It could only demonstrate what is not wrong with it. And we cannot possibly conceive everything that might be wrong.

Straying from the Happy Path

There is a happy path through our code that performs all the requirements perfectly. But there are an infinite number of ways for things to go wrong.

Our unit tests will likely provide 100 percent coverage of the happy path. A good coverage analysis tool will even get us to exercise the catch blocks. Tests will likely verify all the things we were thinking about during development. But what about the unhappy paths through the code?

Murphy's law reminds us that off-happy-path conditions need to be accommodated. Of course, pursuing every possible (and impossible) off-happy-path condition can result in an overbuilt, overengineered solution. Comprehensive testing is an expensive prospect.

We have to strike a balance between Pollyanna and paranoia.

Low-risk systems do not demand that we engineer too much paranoia into them. Then there are life-critical systems subject to hostile attack. For the latter, a great deal of paranoia is only fitting. You need to understand the risks and discuss them with your stakeholders to right-size the paranoia.

I seek Pareto distributions—the 80/20 situations where I get a lot more assurance from adding only a few more tests. One easy hack is to develop user stories where you put on a black hat and write a few  malevolent user stories that we assert the system must thwart. "As a blackmailer, I want the browser history to ..."

Putting on different hats isn't always easy. The problem with blind spots is that we cannot see them. The ways our software malfunctions may require oddball perspectives.

The only thing worse than one engineer coding a test suite with blinders on is a committee applying groupthink to the same task. You want to assemble a tiger team of the most disagreeable people you can get. Set them to work proving that you are an idiot. I know it doesn’t sound like fun, but hopefully, your disagreeable people will propose test cases that you hadn't considered because they lay within your blind spots.

Let Testing Prove You Wrong

When something hurts, we pull back and flinch. We avoid that pain afterward. It's a sad fact of human nature that when we find a really painful bug, we avoid whatever we did to cause the pain. I've watched testers do stuff to my systems and cringed, thinking, "Don't do that, it'll break ... Oh."

I've caught myself feeling this way too many times. This is only reasonable, but it's an ineffective way to flush out bugs. A good thing about automating unit tests and writing them before your production code is that this separates you from the pain.

Because every software error is different, we need diverse tests cases for handling errors. The worst thing that broken software can do is blithely proceed as if nothing is wrong. Almost as bad is the error message "An error has occurred" with no indication of why. Error messages must point the finger of blame to where things went wrong. When I reject your bad input data, I owe you an explanation of what made it bad, and I owe you a hint about how to fix it.

If you agree with the last paragraph, then you need test cases that present bogus inputs and assert that they raise exceptions.

Each distinct thing we do to make the input bogus should correspond to a unit test asserting the code under test generates an error message that identifies the badness we injected. This can be troublesome because we cannot anticipate everything. The best we can do is anticipate the most common potential errors. Pareto is our friend.

Or suppose our code depends on external systems that it talks to asynchronously. Do we have tests when these externalities let us down at the worst possible moment? Have we thought about what those worst possible moments might be? Do we know disagreeable people who think this way and can suggest test cases?

We might not like engaging with disagreeable people, writing hard-to-implement tests, or adding tests whose value we fail to understand. But that pain is a lot less than the pain we'll feel when an end-user reports an issue with our systems long after they have been deployed.

As we face these things we dislike, we can replace our false confidence with true assurance.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.