Flaky tests pass or fail unexpectedly for reasons that appear random. It can be easy to use flaky tests to discredit automated end-to-end testing, but they also can tell you things—about both the application and your team dynamic. Josh Grant gives some technical and human examples of times flaky tests helped his testing efforts.
Imagine this: You are a programming tester who creates automated end-to-end tests for a web application. You use good, modern technology—say, Selenium WebDriver. You consider code a thing of craft, using page objects to separate the tests from application logic. You make use of virtualization and use adequately powered machine hardware. You also understand how end-to-end automation is a team effort, so you work with application developers, testers, and managers in concert to get the most value out of your work. And yet for all your good work, you still have a recurring problem: flaky tests.
These are tests that pass or fail unexpectedly for reasons that appear random. Flaky tests become even worse as test suites grow and more areas of an application are covered. Eventually there comes a temptation to throw up your hands, yell, "This is stupid!" and throw the whole end-to-end test suite into the trash bin.
But are flaky tests really a problem?
It can be easy to shrug off flaky tests and use them to discredit automated end-to-end testing. But they also can tell you things—about both the application and your team dynamic. I'll provide some technical and human examples of where flaky tests helped my software testing efforts that might help you, too.
The Technical Side
There are a few common sources of flakiness in WebDriver-based tests. One of the main culprits is synchronization (or a lack thereof). Web applications have many layers that affect performance, including network connection speed, HTTP handling, source rendering, and computer processing resources. As a result, some operations may vary slightly in timing during different runs of end-to-end scenarios. A button might not appear quickly enough, or a dialog box might not disappear fast enough for an automated script to complete.
One solution is to put wait statements in order to synchronize script steps with the application. This might seem like a hack to avoid flakiness, but it also may be an oracle of performance issues in your application. If some areas consistently need more waits or longer waits, it could be an indication of poor performance—particularly client-side performance—in those areas. For one team I worked with, there was one set of automated end-to-end tests that seemed to fail inconsistently all the time but related to the same feature. When I talked to developers, it turned out that area had some front-end issues due to some bad coding practices. Flaky tests picked up on this problem, if indirectly.
Another problem I've seen producing flaky tests is from accidental load testing. As end-to-end automated test suites grow, the number of lines of test code grows, but there are more tests being executed against the application under test. This usually means test suites are reorganized to run at the same time (in parallel or concurrently) to help cut down on test runtime. While helpful for testers and developers, this can also have the side effect of putting large loads on your application, creating an unintended load test.
Automated end-to-end tests that run perfectly fine in series might get flaky when run concurrently. In one project I was working on, some tests were working just fine when they initially were run individually, but they had problems when we first tried running them in parallel, with a few (seemingly random) failures. After some debugging, one of my teammates found that when run in parallel, our tests all would try to log in with the same admin user the instant tests started, resulting in around eight simultaneous logins by the same user. The application was not prepared for this, and we found out the hard way—but this flakiness was beneficial and helped us design better tests.