As Test Automation Matures, So Do False Positives


In life and in test automation, a lot of things change as you mature—the challenges you face, the types of failures you experience, and the best ways to solve them. Let’s skip the “life lessons” and focus on the test automation angle here

In life and in test automation, a lot of things change as you mature—the challenges you face, the types of failures you experience, and the best ways to solve them. Let’s skip the “life lessons” and focus on the test automation angle here.

From the cradle to the grave, most test automation efforts are plagued by false positives: tests that fail even though the application is behaving correctly. This is a tremendous pain. First, someone needs to figure out why the test failed. This might involve manually checking the related application functionality, inspecting the test automation, reviewing the test data, checking whether every element of the test environment is up and working as expected…maybe even all of the above. Once the exact problem is found, it needs to be fixed. Then the test needs to be rerun. If it still fails, rinse, and repeat.

At best, this causes a tremendous resource drain that can quickly eat into the ROI you were probably hoping to achieve with test automation. Tricentis has found that a staggering 72 percent of test failures are actually false positives. Think about all the tests you have, all the test failures reported for each run, and do the math. False positives consume a lot of resources that could be dedicated to much more valuable work.

At worst, it undermines your test automation initiative altogether. Once you start ignoring false positives, you’re on a slippery slope. Developers start assuming that every issue exposed by testing is a false positive—and testers need to work even harder to get critical issues addressed. Moreover, if stakeholders don’t trust test results, they’re not going to base go/no-go decisions on them.

But enough about why false positives are such a pain. How do you relieve that pain? Well, that depends. All false positives aren’t the same. A team that’s just getting started with test automation faces different types of false positives than a team that’s doing more extensive and advanced test automation. Since the false positives’ causes vary, so should your strategies for eliminating them.

At Tricentis, we’ve been collecting data on false positives at Global 2000 companies for over a year. We recently analyzed this data to determine what’s causing the failures at various stages of test automation maturity, and what strategies help eliminate those failures. Here’s what we found…


Grappling with the Basics

At first, it’s all about the basics: from both an automation and failure perspective. When you begin creating test automation, you typically start with the easy things, like searching for data or creating a new object (such as an account). At this point, your test automation might break simply because you’re still learning the nuances of whatever automation technology or approach was selected. These basic automation issues are actually the source of most false positives—and they’re rather frustrating because they thwart your initial test automation victories.

Of course, some approaches have a steeper learning curve than others, and some tend to fail less than others. In any case, you’ll always want to set aside some time for mastering the basics. At the very least, research the relevant best practices for steering the specific types of applications and interfaces you’re working with, different ways of specifying basic test data, and the various validation options that are available.


Growing Pains

Before long, you’ll need to progress from object creation tasks to object administration tasks. I’ve written before about how risk = frequency * damage. For example, in an online banking application, creating objects has high frequency but minimal damage. On the other hand, administration tasks (e.g., reversing fraudulent charges) have a low frequency, but much higher damage. Why does this matter here? Because to really cover your risk, you need to cover these administration tasks as well as the easier creation tasks. And, when teams move on to automating these more complex scenarios, test data emerges as the most likely source of false positives.

Take the example of reversing fraudulent charges. To automate this, you’d first have to produce the “state” where an account was created and had a history of charges. Then, the test would need to indicate that certain charges were fraudulent and reduce the amount due accordingly. Getting such a test to run automatically once is challenging. Getting it to run repeatedly without triggering false positives requires a pretty advanced mastery of stateful test data management. Our study found that test data management issues were to blame for 23 percent of false positives.

Test environment issues also emerge as your tests climb higher up the so-called agile test pyramid and cross through more systems. Given that the average application under test interacts with 52 dependent systems, an end-to-end test is likely to touch something that’s unavailable, modified, or just timing out before your test gets the expected response. You know where this is going: more false positives. 16 percent of false positives can be traced back to test environment issues. Service virtualization—ideally stateful—is the key to eliminating these issues. Simulate the behavior you need so that it’s configured correctly, consistent, and always available. Of course, you’ll want to test against the actual systems before you release. But using these simulations for your daily continuous testing will enable fast feedback while preventing the headache of false positives (e.g., if your test happens to run when a dependent system is being redeployed).  


Fickle Flakiness

Finally, the most insidious and difficult-to-tame source of false positives: test flakiness. Google defines a "flaky" test result as “a test that exhibits both a passing and a failing result with the same code.” In many cases, such failures do indeed point to a problem in the code. However, a good percentage are false positives.

This breed of false positives usually emerges once you’ve conquered all the aforementioned types of false positives and truly integrated testing into your CI/CD pipeline. Often, the flakiness is the result of asynchronous application behavior that you didn’t consider when you set up the automation. Other causes could be concurrency issues, network issues, or test order dependencies. I have to admit: this is a tricky one to solve. Using pattern recognition and machine learning (ML) to automatically identify (then rerun) flaky tests with a high likelihood of being false positives seems promising…but the devil is in the details.

In fact, I think applying AI and ML to determine if test failures are false positives is an extremely feasible—and valuable—endeavor. I know, it’s supposed to be great to learn from your own failures. But wouldn’t it be nicer to forego the suffering and just benefit from the lessons learned? I think we’re extremely close to reaching that point…in test automation, if not in life.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.