Automated UI tests as regression tests do not work very well in practice. These tests are expensive to build and a burden on the project due to constantly high maintenance effort–so much that people have calculated the ROI of reducing their automated regression tests. Not only that, they are brittle and incomplete, and test results are somewhere between unreliable and misleading.
This is the reason for the famous test pyramid. And all of this only because we misunderstood regression testing and are using it wrong!
Everybody talks about continuous delivery and shifting left. But there is an elephant in the room: Much more test automation is required than is currently given in most projects. And not everyone is Google or Facebook and can afford to show an error to a few thousand users now and then. This means that before the big dream of continuous delivery—or even regular delivery—can be achieved, the topic of automated testing has to be addressed. And this may require approaches other than what we’re currently using.
You probably know the common tools for recording or "scripting" UI regression tests. It doesn’t matter whether these are open source tools or commercial alternatives—they all work with the same underlying principles. A test basically is a program itself, which controls the software it wants to test by simulating user inputs and checking results. If there is something you do not check, errors may slip. But each check needs to be created and maintained manually, which means a lot of effort.
If you have a screen with fifty input fields—which can happen quickly, such as with tables—then you have to define fifty checks. For only ten tests, this adds up to five hundred checks. No wonder only the most necessary checks are carried out in conventional tests, which means that the tests are incomplete.
Of course, the developers in us immediately have plenty of good ideas about how we can abstract and simplify. And many ideas are implemented in numerous tools. But these approaches only reduce the extent of the underlying problem; they do not solve it.
Another common problem is that you need business knowledge to define the test, but a programmer (or at least a trained tester) to encode or "script" it. Many tools are already going a long way to simplify this process, and often you can learn the script language in a few days if you are technically inclined, but it is still an extra hurdle.
Regression Tests Are Not Functional Tests
The purpose and usefulness of a functional test is to uncover functional errors in the code. After performing the test, we can be sufficiently sure that the tested functionality has been implemented correctly. Functional tests are mandatory when a feature has been added or changed. But are functional tests also good regression tests?
A regression test has a different objective from a functional test. A regression test tests software after it was already manually tested, approved, and probably installed and used by the customer—in effect, it is already working software. If this software still contains functional errors that have gone unnoticed for years, it is not the task of the regression test to find those errors. Yeah, you read that right: the goal of a regression test is not to find errors.
If the code contains an error, but the software has already been used by the customer for three years, then often these errors must not be corrected. In some situations, errors even had to be reintroduced after they were fixed without asking the customer first.
But if the goal of a regression test is not to find errors, then what does it do? The goal of a regression test is functional consistency. The regression test ensures that, after a change to the software, the unchanged parts still work the same as before—regardless of whether this is functionally correct in principle.
The Kind of Regression Tests We Need
As the adage goes, “If a hammer is your only tool, everything looks like a nail.” The fact that we use functional tests as regression tests is solely due to the fact that we do not have any tool support for what we actually need: a consistency check.
Functional tests "fixate" the behavior of the application. This is good if the behavior of the application must not change. But in reality, the behavior of the application has to change all the time. And this is when these "fixating" tests transform from help to hindrance.
In order to recognize a functional regression after a change, we do not need manually defined functional checks. It is entirely sufficient to recognize changes in the behavior of the system, similar to a version control system. To accept the current state as correct or at least as a given and to define tests based on this state is not a new realization. Michael C. Feathers describes this approach in his book Working Effectively with Legacy Code and calls it a "characterization test."
However, this "recognition of changes" is not enough. What makes a version control system so effective is the possibility of either ignoring volatile changes permanently (masking them) or being able to simply commit these changes and update the associated tests efficiently. And that's exactly what we need for our regression test system, too.
In one of my last projects, we built something similar ourselves. We created a web crawler that searched our site and created FitNesse tests with assertions from everything it found. But for the parts of the system that still frequently changed, the “copy and paste” of updates was cumbersome. The “review changes and approve” functionality of the system was missing. So we only created tests for mature and stable parts of the system. This worked very well and reduced both testing effort and risk enormously. Quite often, these tests would spot nuanced differences that manual testing would likely have missed.
Fixing Our Regression Tests
Current approaches to automated UI testing are broken because regression testing is not testing; regression testing is version control of the behavior of the system. This realization fixes many common problems and makes creating and maintaining tests much more efficient.
User Comments
We are starting a kickstarter campaign to fund our comparison tool rediff. Please back us here.
Hi Jeremias, great read! I am curious how you would classify testing a bug fix? this would 'appear' to fall under regression testing but it does lend it self towards functional testing or is this a different classification altogether? and once the bug had been fixed does it become a regression test?
Hi James, great you liked it and thank you for your comment.
I think regression testing is just testing again - no matter whether this is functional testing, performance testing, or any other form of testing. The software at some point had some wanted attribute (e.g. feature / functionality) and is tested again to see if it still has that attribute (e.g. after a change). Most common are functional regression tests, ideally automated.
In my opinion, a bug means that the actual behaviour of the software differs from the expected behaviour of the software. Sometimes, this difference was always there and only recently discovered. Sometimes, the expected behaviour changed, rendering the actual behaviour to become erroneous where it was previously considered correct. It doesn't really matter. If the bug is already covered by a test, than the bug fix is a change like any other - and requires an adaption of the test, again lending to the version control analogy. If it was not covered by a test, than it probably makes sense to add that test.