Test Automation Stumbling Blocks: Ignoring Routine Maintenance

[article]
Summary:

Just like a vehicle or any other complex machine with moving parts, test automation requires regular maintenance to keep it in a running state. And just as with vehicles, failing to perform routine maintenance of your test automation suite causes a buildup of minor issues, which, over time, creates compounding and expensive failures.

Imagine buying a very expensive car and driving it on a daily basis without having routine maintenance performed. Even worse, imagine doing this while only looking at the vehicle’s gauges once every few weeks, ignoring any red or yellow warning lights that appear indicating that something is wrong. I doubt many intelligent folks would do this, but I see many test managers doing something just as expensive with assets they have spent good money building.

Just like a vehicle or any other complex machine with moving parts, test automation requires regular maintenance to keep it in a running state. And just as with vehicles, failing to perform routine maintenance of your test automation suite causes a build up of minor issues, which, over time, creates compounding and expensive failures. As these failures pile up unresolved, they become exponentially more difficult to fix as engineers are forced to first spend time unraveling the issues.

From my observations, a decrease in the time between the start of an automation failure and the start of the diagnostic activities will correlate with a decrease in the amount of actual time spent resolving the issue. Issues as defined here could be application bugs or something external to the application under test (AUT), such as environmental failures or issues with the tests themselves. Applications and their targeted test automation have a tendency, particularly when not being watched regularly, to diverge over time. I’ve heard this phenomenon called “drift,” and this drift represents the slow change between a constant object, such as idle or unwatched test automation, and a moving object such as an AUT undergoing active development. The key here is to actively reduce the amount of time between an issue occurring (a bug introduced into the AUT, an environmental failure, etc.) and the start of the initial troubleshooting of the issue. Reducing this time will, as correlated, reduce the amount of time necessary to identify the source of the issue and get it resolved. To state it more succinctly, faster identification and faster troubleshooting leads to faster resolution.

One of the best ways to reduce the time between the introduction of an issue, its detection, and the start of troubleshooting is to run your test automation at least five days or nights each week, even if an application is not actively going through major development. Built correctly and run often, these tests will serve as a heartbeat for both the AUT and the test environment. Running your test automation on a frequent schedule allows your engineers and testers to raise defects more quickly and also allows your team to actively identify and reduce drift, eliminating it before it becomes unmanageable. This quick identification of bugs and reduced drift will generally also have the desired effect of increasing your customers’ and testers’ confidence in the test automation. I recommend running your test automation at least daily even for applications not undergoing active development as these still have a paradoxical tendency to drift apart over time.

In addition to running your automated test each workday, I recommend the following maintenance activities to keep your test automation in optimal shape:

  • Run as much of your automation suite as possible after each release to the test environment
  • Read the results of your automated tests each time they execute
  • Have resources assigned to act on the results immediately in order to raise bugs quickly or resolve each failure as soon as possible. Each failed test can mean one of a number of things, and identifying the cause of a failed test often requires a set of human eyes

While this list is far short of exhaustive, some of the more common causes of failing tests are:

  1. Test data failure: some change occurs in the test data creating unpredictability in expected results
  2. Test environment failure: something occurs in the environment causing a failed AUT state or unpredictable behavior
  3. Automation infrastructure failure: something occurs in the test automation infrastructure causing an unpredictable or failed execution environment
  4. Defect in the automated test: a failure to update an expected result within a test when the AUT changed due to a new requirement
  5. Application under test failure: a “real” code defect

Track every failed test, identify the main culprits of the first four types of failures, and actively work to reduce the root causes of these first four types. The goal is obviously to find as many of the fifth category, AUT defects, as possible while minimizing the instances of the first four causes.

However, no automation effort is perfect out of the box and even the best-implemented test automation will experience instances of the first three categories, particularly at the beginning of the automation effort. See my article “Test Automation Stumbling Blocks: Foundational Problems” for more on eliminating or reducing the first three causes of failure. Multiple instances of the fourth cause of failed tests, defects in the automated tests themselves, may indicate a process or communication issue around changing application requirements.

The intent here is to drive out the causes of the first four types of errors and, over time, reduce these errors to as close to zero as possible. In a perfect world, every failed test would represent a defect in the AUT. While this model is likely impossible to achieve in the real world, the closer you and your team can get to this model, the more effective your test automation will be and, in turn, the higher your customers’ confidence level in the test automation will be. Reducing the first four causes of test failures also results in faster communication of “real” bugs to developers. This is a good thing regardless of whether you are running an agile or traditional development process.

While this model may seem, on the surface, to be more expensive than running automated tests less frequently, especially for applications not undergoing active development, I have found that by running all of your automated tests regularly and aggressively driving out non-AUT failures, over time the test automation becomes much easier to maintain while confidence in the test automation increases across the team including among testers, developers, and product owners.

User Comments

4 comments
Jim Hazen's picture
Jim Hazen

Don,

Right on target. Test Automation and it's assets are a form of code that needs to be maintained over the long haul. Management and other less experienced people do not realize that and will fall into the 'drift' (or sometimes riptide) of automation issues as you have stated.

This is one of the big problems with automation in that companies do not, or may not want to, understand the recurring costs associated to having test automation (of any type and/or level). This cost can be small and manageable, or it can become huge and become uncontrollable.

This cost, rework, is the same cost as for any type of software. Rework is the number one killer of projects, and companies in some instances. It takes people with the insight to see it and follow up on it. That and the management support to keep it under control. Otherwise that Corvette you bought a few months back is going to throw a rod.

September 2, 2013 - 4:45pm
Timothy Western's picture
Timothy Western

I agree with this article, but I feel the need to ask some context sensitive questions. Is running automation once or twice daily sufficient? Is there such a thing as running automation checks, too often? How much time should be allocated for the inevitable maintenance that comes up? Lastly, what is your advice about failures that may be transient, or changing with respect to load or usage at any given time?

October 11, 2013 - 3:07pm
DON PRATHER's picture
DON PRATHER

Hi Timothy! Good questions. I'll start from the top. Is there such a thing as running automation checks too often? The optimum frequency to run automation checks will be very project-specific and depend on the individual cadence of your project team(s). Running the tests daily is the minimum I would consider, even for an application not going through change. In my opinion, the "optimal" situation is to have the automated checks occur immediately following a deployment to your test environment. Remember, what we are trying to do is catch a defect as close to the moment it occurs as possible. So, if your team is releasing to QA once or twice daily, then having the tests run once or twice a day makes sense. If your project team follows a continuous integration methodology and releases to QA many times throughout the day, running the tests once for each deployment (preferably kicked off automatically) makes good sense. To answer your question more directly, yes, it is possible to run the automated checks too often. Each time you run a test or test suite, there will be some finite amount of resource time needed to look at the results so if your tests run too often, you will have to dedicate more resource time to watching the results. The key is to balance the cost of running the tests more often (resource time, potential communication overload, etc) with the benefits and determine the optimum cadence for each project.

2. How much time should be allocated for the inevitable maintenance that comes up? This will depend greatly on the maturity and stability of three factors: your test environment, your test data, and the tests themselves. I'll start with the tests. When automation is first put into use, there will be tests that don't do exactly what you expect them to do, invalid bugs will be detected, and the tests will need to be tweaked. As this tweaking occurs the tests will become more fit for purpose and fewer and fewer invalid bugs will be detected. Over time, the tests will become mature enough that they are targeting the right requirements and more and more bugs detected will be valid application errors. On test data and test environment, it is important to maintain a laser-like focus on thes items and get your test data and test environment as stable and predictable as possible. It is frustrating and time consuming to raise bugs only to find out that your test data was changed without you knowing it or that your data expired and needs to be reset. The stability of the environment will also play a huge factor in the maintenance time required. As these three items mature, you'll spend less and less time fixing them and your folks will be able to focus only on reviewing results and raising valid bugs; the overall mainenance time will optimize (decrease to its minimim possible amount).

3. Advice about failures that may be transient, or changing with respect to load or usage at any given time. These are the most difficult failures to nail down as they are often moving targets. The important thing is to try to identify the variables in play when the tests return different results. If the variable is load, it may be beneficial to run your tests at low load and again with a higher load applied via a load testing tool. Truly transient bugs often require many iterations to nail down and are best dealt with via multiple rounds of manual testing (rather than automated) in an attempt to identify and isolate the variable(s) in play. Automated tests work best with your variables frozen. When you find yourself needing to change variables in succession, manual testing may be the key, at least until the variable is found and eliminated.

October 11, 2013 - 4:20pm
Nadia Linares's picture

Thank you Don!

Very useful article: short and to the point. 

I would also add that maintenance time increases when you develop in the test driven development environment. When AUT changes relatively often the tests need to be adjusted right away to stay current and efficient. 

December 24, 2013 - 5:29pm

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.