Just like a vehicle or any other complex machine with moving parts, test automation requires regular maintenance to keep it in a running state. And just as with vehicles, failing to perform routine maintenance of your test automation suite causes a buildup of minor issues, which, over time, creates compounding and expensive failures.
Imagine buying a very expensive car and driving it on a daily basis without having routine maintenance performed. Even worse, imagine doing this while only looking at the vehicle’s gauges once every few weeks, ignoring any red or yellow warning lights that appear indicating that something is wrong. I doubt many intelligent folks would do this, but I see many test managers doing something just as expensive with assets they have spent good money building.
Just like a vehicle or any other complex machine with moving parts, test automation requires regular maintenance to keep it in a running state. And just as with vehicles, failing to perform routine maintenance of your test automation suite causes a build up of minor issues, which, over time, creates compounding and expensive failures. As these failures pile up unresolved, they become exponentially more difficult to fix as engineers are forced to first spend time unraveling the issues.
From my observations, a decrease in the time between the start of an automation failure and the start of the diagnostic activities will correlate with a decrease in the amount of actual time spent resolving the issue. Issues as defined here could be application bugs or something external to the application under test (AUT), such as environmental failures or issues with the tests themselves. Applications and their targeted test automation have a tendency, particularly when not being watched regularly, to diverge over time. I’ve heard this phenomenon called “drift,” and this drift represents the slow change between a constant object, such as idle or unwatched test automation, and a moving object such as an AUT undergoing active development. The key here is to actively reduce the amount of time between an issue occurring (a bug introduced into the AUT, an environmental failure, etc.) and the start of the initial troubleshooting of the issue. Reducing this time will, as correlated, reduce the amount of time necessary to identify the source of the issue and get it resolved. To state it more succinctly, faster identification and faster troubleshooting leads to faster resolution.
One of the best ways to reduce the time between the introduction of an issue, its detection, and the start of troubleshooting is to run your test automation at least five days or nights each week, even if an application is not actively going through major development. Built correctly and run often, these tests will serve as a heartbeat for both the AUT and the test environment. Running your test automation on a frequent schedule allows your engineers and testers to raise defects more quickly and also allows your team to actively identify and reduce drift, eliminating it before it becomes unmanageable. This quick identification of bugs and reduced drift will generally also have the desired effect of increasing your customers’ and testers’ confidence in the test automation. I recommend running your test automation at least daily even for applications not undergoing active development as these still have a paradoxical tendency to drift apart over time.
In addition to running your automated test each workday, I recommend the following maintenance activities to keep your test automation in optimal shape:
- Run as much of your automation suite as possible after each release to the test environment
- Read the results of your automated tests each time they execute
- Have resources assigned to act on the results immediately in order to raise bugs quickly or resolve each failure as soon as possible. Each failed test can mean one of a number of things, and identifying the cause of a failed test often requires a set of human eyes
While this list is far short of exhaustive, some of the more common causes of failing tests are:
- Test data failure: some change occurs in the test data creating unpredictability in expected results
- Test environment failure: something occurs in the environment causing a failed AUT state or unpredictable behavior
- Automation infrastructure failure: something occurs in the test automation infrastructure causing an unpredictable or failed execution environment
- Defect in the automated test: a failure to update an expected result within a test when the AUT changed due to a new requirement
- Application under test failure: a “real” code defect
Track every failed test, identify the main culprits of the first four types of failures, and actively work to reduce the root causes of these first four types. The goal is obviously to find as many of the fifth category, AUT defects, as possible while minimizing the instances of the first four causes.
However, no automation effort is perfect out of the box and even the best-implemented test automation will experience instances of the first three categories, particularly at the beginning of the automation effort. See my article “Test Automation Stumbling Blocks: Foundational Problems” for more on eliminating or reducing the first three causes of failure. Multiple instances of the fourth cause of failed tests, defects in the automated tests themselves, may indicate a process or communication issue around changing application requirements.
The intent here is to drive out the causes of the first four types of errors and, over time, reduce these errors to as close to zero as possible. In a perfect world, every failed test would represent a defect in the AUT. While this model is likely impossible to achieve in the real world, the closer you and your team can get to this model, the more effective your test automation will be and, in turn, the higher your customers’ confidence level in the test automation will be. Reducing the first four causes of test failures also results in faster communication of “real” bugs to developers. This is a good thing regardless of whether you are running an agile or traditional development process.
While this model may seem, on the surface, to be more expensive than running automated tests less frequently, especially for applications not undergoing active development, I have found that by running all of your automated tests regularly and aggressively driving out non-AUT failures, over time the test automation becomes much easier to maintain while confidence in the test automation increases across the team including among testers, developers, and product owners.