My team is responsible for testing an accounting application that uses complex calculations and posts ledger entries to the accounting system. It is also a workflow, and closing books every month is the most critical flow in it.
With each release there is some change in the calculations—for example, an account may need to be debited more or credited more, or a value should now be shared by two accounts. These changes in the complex application with many internal and external dependencies often result in unexpected behavior, including regression issues, in unexpected and seemingly unrelated areas.
Let’s establish something about a regression issue, sometimes called a critical incident or production issue. Working for a few years in the software industry teaches you that a regression issue leaked to production is far more critical than a broken new feature. That’s because a new feature is still not onboarded, so when it breaks it has less impact on business, whereas for regression issues, the user is relying on a working feature and most likely does not have a fallback system for it.
Regression issues are caused by changes that were not factored in by the project manager or product owner in the acceptance test; the architect, developer, or code reviewer in the design or implementation phase; or the QA analyst or tester in the test cases. The issue was missed by all safety nets and hit the user. If the impact of the change was identified by any of the above phases, teams, stakeholders, or any other fancy name that you use in your organization, then it would have been addressed before the release, or at least evaluated for criticality.
Regression issues entail several costs: urgent fixing, penalties for potential breaches of a service-level agreement, and, most importantly, harming the trust of your users. They may feel that the new release will break something that is working.
It was becoming increasingly important to my team to test everything.
However, it is well established that this was impossible, or at least too expensive and time-consuming to be feasible. So “testing everything” was scoped to two things: critical business flows, and making sure behaviors in the most critical business flows are the same as in the last release, before the change. Basically, we wanted to ensure no regression issue was introduced with the release in the critical business flow.
We evaluated our typical approaches for automated testing to verify calculations and that all logic is written in test automation code. This approach posed the classic problem of “Who tests the tester’s code?” If a test case failed after execution, there is an equal chance that the issue is in the test code. Also, each time the application changes the test code needs an update, and such changes were frequent.
We also verified in our automated testing that we had a fixed set of inputs and already known outputs. Due to the complex nature of the application, it was not possible to build all possible inputs in one go. We also needed different sets of data for month-end, quarter-end, and year-end. The schedule was a major problem because it would take a long time to build a huge input set and corresponding scripts.
There was another variable in the mix: user data. We had the privilege of receiving user data backups every month. Quality of a test depends on the data used for testing, and production data is always better than generated data, so production data was a huge advantage that we did not want to skip.
Identifying and Implementing Regression Tests
Our approach was to have test automation that needs the least possible maintenance and that builds confidence in stakeholders that the release is of good quality.
So, we needed a test strategy for critical business use cases, that would ensure there is no regression issue introduced, and, of course, that we could implement fast.
This was our setup process:
- The production database backup is restored twice
- Two parallel test systems are set up
- One with production-released code
- One with a current version of the application under test
This provides two identical setups with code differences in only one version:
Keeping the two setups the same is critical, as this ensures that any issue is only from the new changes being pushed.
Test cases are split, so from the standard process of “Perform action and verify reaction,” the actions are performed from one milestone to another for the workflow, and then reports are compared. This is the key in identifying unexpected changes.
When a tester is focused on a feature or change, the test typically is for the change and ensuring the change is in place. Regression testing is different in the sense that it has to verify that nothing else has changed. This difference in mindset is reflected in automation scripts. It makes the feature test scripts unsuitable for finding regression issues, so we need a different approach to the problem.
For example, if you’re working with an order management system, there would be an order-placing script with multiple inputs to place orders on the two setups (preferably in parallel), and then you’d pull a daily order book report from both setups and compare for each value. Next, all orders would be confirmed or approved (this is the action), and then reports like daily approved orders, orders by item, inventory related report, orders per carrier, shipment type, payment type, etc., would be compared for each value. This continues through the workflow. You can club actions like order placing and approving and then compare reports at certain milestones.
Another example would be a hotel management system where discrete actions are identified as critical business flows, like check-in, restaurant billing, and inflow of inventory. All these processes will have their respective actions and reports. The difference in such a system compared to the previous example is that the suite can run in parallel, and there’s no need to complete a step before you start the next step.
The comparison of the two reports is the moment of truth, and it has to be impeccable; none of the stakeholders should have any doubt about its correctness. A difference reported is a true difference.
We use a web service interface for this. All report calls are made in parallel to the two systems, and the JSON response is compared.
The comparison is threefold:
- Source (production) minus target (application under test)
- Target minus source
- Value comparison for the intersection
This will be true for XML, XLS, CSV, fixed width, or any other format. We need to ensure that there is no extra record, there is no missing record, and all values of the records are matching.
This is the core of the approach we are talking about. These are the read calls available in an application, typically as the reports or, in some cases, interfaces to other applications.
The success of this approach lies in a comparison tool or utility that handles the cases relevant to your application. You can consider yourself lucky if you find something off the shelf; if not, then investment in this area is worth it, as you will reap good benefits from it.
After all this automation, it’s time for manual intervention. Because some differences are expected ones that should be there as per the requirements, the results need manual analysis. There are clear pass test cases, but failures also need to be analyzed and confirmed for validity. As these are regression testing failures, they are required to be fixed before release. Of course, there are some exceptions that will have to be handled based on the application.
Setting Up Your Program
Every application is different, and so is the application setup. There can be some other steps needed to prepare your application for testing, so these steps need to be factored in at the right time in the process. But these are the typical steps:
- Obfuscate data from production by deleting email IDs or other sensitive information, or replacing it with dummy data
- Get the data in the correct state for the test to start
- Address configurations for the QA environment, like changing integration links
The only point to remember here is that all of the above actions need to be performed for both setups. Remember, before test case execution starts, the setups should be identical.
It is not uncommon that actions other than report calls also return an object, like creating or modifying an order returning the new order object. In that case, it is good to compare the two objects and not wait for a report comparison. It can help identify a defect sooner and closer to the root cause.
It is also a good idea to break the entire suite into smaller sets, such as grouping transaction and related report calls together. You can run the sets in parallel to save execution time. For a workflow application, this may not be an option unless you can slice the cases horizontally versus vertically, or vice versa.
The variations can start from technology—JSON, XML, or scalers (int/string/float), etc.—and extend up to the point that the test and production installations are expected to respond with different structures as per the design. For example, the production release may be using an older JAR file that results in a certain format, and for this release the JAR is updated and the response format is changed, so comparing the two will give all mismatches. A temporary plug-in would be needed to make the comparison possible.
Though these kinds of situations will probably be few and far between, sometimes it would be easier to get the design fixed and sometimes it’s easier to address it as a workaround.
There are a few options to handle these kinds of comparisons:
- Ignore some fields, like IDs and dates
- Ignore numeric differences of less than 0.0001
- Ignore case sensitivity of strings
- Structure changes in two responses
Improving Your Regression Testing
Regression testing needs to be holistic in verification while being focused in scope. This balance can benefit from automation. An automated testing approach specific to reducing regression issues can go a long way toward a good client relationship and high brand value.
In the spirit of continuous improvement, now my team’s plan is to get rid of the two identical setups we use and implement the same strategy with one setup. We want to have the responses from previous executions saved and then use them for comparison. Testing approaches can always be improved upon—wish us luck!