Things that Find Bugs in the Night: Massive Automated Regression Testing

article

April 16, 2004

Summary

Nighttime is the right time to run those extra tests you've always dreamed of. In this column, Harry Robinson explains how you can wake up to 40,000 additional tests a day at a cost that won't give you nightmares.

Testing Around the Clock
It's 10 PM. Do you know what your test system is doing? Probably nothing. And the same will be true at 11 PM, at midnight, and through the night. That's a shame. Wouldn't it be nice if you could pick up a few million extra tests in the off-hours? Well, you can, and for minimal cost and effort. Imagine you are testing a GUI where it takes a second to feed in an input and evaluate the result. Even at that conservative rate, a single machine could run through more than forty thousand tests in twelve hours! That's pretty respectable. If you are testing an API, you could churn through a billion tests without breaking much of a sweat. That's better than respectable! A few minutes of thinking and programming before leaving the office could pick up several hours of additional testing for free. And the results are well worth the effort. I love to hear a developer gasp "How on earth did you ever find this bug?"

Variety
Of course, it doesn't do much good to keep running the exact same tests over and over. Your machine must have a way to generate new inputs every time. Here are some ideas:

Use a random number generator to choose numbers within equivalence classes.
Feed the application random strings by mapping random numbers to a character set.
Generate every three-letter combination of printable ASCII characters. That's 884,736 strings!
Endlessly press all available buttons. (To find memory leaks, press every button except "Exit".)

Creativity
Oddly enough, providing fuel for round-the-clock testing requires creativity more than it requires programming ability. For instance, to generate random strings, I once tweaked an existing password generator program from the Internet. Within 10 minutes, I had generated millions of input strings that looked like "Mh7-ZyQr?9d3W".

As another example, I tested a system that processed natural language, so I needed to provide almost—sensible sounding phrases to the parser. I downloaded etexts such as Moby Dick and The Hacker's Dictionary from the Internet. This allowed me to test the application with evocative phrases such as "Come forth from behind your cotton bags!" and "Barfulation! Who wrote this, Quux?" respectively.

Elaborations
If you want to increase the effectiveness of these generated tests, you can go several routes:

Have your developers add assertions to their code. Assertions check the internal state of the software and make it possible to immediately detect more subtle bugs like data corruption and stack overflows.
Vary probabilities in your randomness. For instance, if your test program is exercising a dialog and has to choose between clicking buttons and typing characters into fields, favoring one over the other can dramatically change the behavior of your tests.
Use heuristic oracles to determine if results are reasonable. For instance, if you are testing a mortgage calculator, check that the calculated payment is never negative.

Costs
What are the costs involved with this approach? Well, other than stretching a few brain cells, there aren't many. That's a big advantage for those who are time-rich but resource-strapped. The economics are very tempting: A computer that is sitting there doing nothing is an idle resource.

It may even make sense to buy machines specifically for running tests around the clock. STQE's 2003 salary survey says that a typical tester earns around $50,000 per year, not including benefits, office, vacation, etc. You can buy a good computer system (Pentium 4, no monitor) for $700 that you can expect to last two years, or $350 per machine per year. Five computers that did nothing but run tests day and night would cost about $1,750 annually, but at a rate of one test per second, those systems would generate over 150 million tests in that year.

Benefits
Your machines only have to find a few bugs to make the investment worthwhile. And most interestingly is that they will find bugs that you might never have found otherwise.

When I wrote the small test program to send in every possible character combination, the application choked on strings such as "!M!". Who would have tested for that string manually?
The program that clipped words and phrases out of documents on the Internet caused many crashes on my application, the most unusual were the inputs like "Type 5" (the word "Type" followed by 3 spaces and the numeral "5"). Again, I would never have hit that particular string by testing manually.

These strings found bugs that customers might have found through other, more common, inputs. And since these bugs caused crashes, they posed a security problem as well as an annoyance to the customer.

Because these input generation programs are lightweight, we can push them far forward into the development cycle. That means that many bugs are stopped early, while the cost of fixing them is low. You can even use these techniques to extend the reach of your exploratory testing—do you think that the application might have a problem handling alphabetic characters? Try a few likely cases by hand, then write a small program to generate them by the millions overnight. It becomes a pleasure to come in the next day just to see what bugs might have been caught by the system.

Finally, one benefit that shouldn't be overlooked is that finding time for testing is no longer an issue: If you can think of the test, you can run it. So, before leaving work tonight, take a few minutes to cobble together a small program to pummel your favorite application. Then set it running and go home. In the morning, you can see how the application fared.

In the meantime, sleep tight and don't let the bad bugs bite.

Further Reading

Don Slutz's "Massive Stochastic Testing of SQL" was a path-blazing paper on using this random generation approach to test databases.
James Whittaker is a pioneer in using probabilities to generate test data; see his paper "Stochastic Software Testing."

Topics:

test automation testing

About The Author

Harry Robinson

Harry Robinson is a Software Engineer in Test for Google. He coaches teams around the company in test generation techniques. His background includes ten years at AT&T Bell Labs, three years at Hewlett-Packard, and six years at Microsoft before joining Google in 2005. While at Bell Labs, he created a model-based testing system that won the 1995 AT&T Award for Outstanding Achievement in the Area of Quality. At Microsoft, he pioneered the test generation technology behind Test Model Toolkit, which won the Microsoft Best Practice Award in 2001. He holds two patents in software test automation methods, maintains the Web site Model-based Testing, and speaks and writes frequently on software testing and automation issues.