The way software is developed and shipped has undergone several evolutions in the last few decades. But from the monolithic software development lifecycle phases of the waterfall era, to the V-model, followed by agile and now DevOps, one thing has remained unchanged: the way we treat test execution results, especially for automated tests.
In the waterfall era, the tools available for automating functional tests provided ways to drive the UI, but the test verification part was limited to certain checkpoints only. This compelled testers to generate the test result reports manually after each run and analyze them for any anomalies. Microsoft Excel used to be the preferred spreadsheet package for analyzing such results. This served the purpose at the time, because the phased nature of development methodologies allowed enough time for analysis and reporting of long and repeated tests.
Today’s DevOps era demands continuous testing for release cycles as often as weekly or daily in some cases. Apple, for example, released eight major iOS 12 versions, excluding beta, between June 2018 and April 2019. Similarly, Google launched eight major Chrome browser version updates in 2018.
The continuous delivery pipeline requires each process of the development lifecycle to be well fed with constant feedback, often referred to as the feedback loop. This pipeline requires automated tests to be constantly running, with tests triggered on every code commit. Several agile and DevOps teams also use functional and nonfunctional test automation tools in addition to unit tests as part of this test delivery process.
The digital mesh we are in today also brings product offerings with several usage manifestations—for example, Netflix can be consumed on a smartphone, TV, laptop, tablet, or gaming station. Each such delivery platform requires its own testing to ensure service assurance on that platform.
Also, several products are an orchestration of several subsystems that go into complete experience delivery. Each subsystem is built on a specific platform and requires its own tests. All such subsystems and service consumption platforms are tested using specific test tools and frameworks.
However, the fundamental approach toward disposing the test execution results has remained the same: Each test cycle for each platform or subsystem generates its own test execution results, which are often still kept in the form of a spreadsheet or a flat file. These results frequently become the end point for that test cycle and are discarded as soon as they are manually analyzed and a test report is created.
The typical test metrics teams collect are usually centered around defect count, test coverage, requirements coverage (including how many tests passed or failed, how many tests were executed or skipped, and who tested what), and test velocity (how much was tested in a given timeframe).
But testers need to be looking at what business value they are delivering with their test efforts.
The business side wants the answers to questions like:
- Has the product quality improved? By how much?
- How much money has our testing saved?
- Have we been able to prevent defects?
- Metrics like defect count, test case execution classification (passed, failed, skipped, blocked) and test velocity do not answer these questions.
Test Automation in the Continuous Delivery Pipeline
In the context of the continuous delivery pipeline, all the stages are interconnected and cannot function independently in a mutually exclusive manner. Stages like development, continuous integration, automated testing, go live, and monitoring are required to provide bidirectional feedback for a seamless release cadence.
But automated tests as typically conducted fall short of providing adequate and timely feedback to the rest of the processes.
That’s because conventionally, the purpose of test automation has been thought to act as an enabler in increasing test velocity. But this is a shallow understanding of test automation. Velocity itself is of no use if the direction is incorrect or unavailable. With tight timelines, it becomes prudent to test accurately the specific areas that require being tested.
Teams struggle with executing thousands of tests that are often redundant, obsolete, incorrect, or irrelevant, none of which are required for the specific release. The struggle often leads to skipping tests solely based on their low priority in the holistic context of the product or subsystem, ignoring the fact that they may be relevant in the context of the change being shipped in the release.
The real potential of test automation can be leveraged if it is complemented with regression analysis, a field of mathematical study in which two variables are evaluated for the relationship between them, and the value of an unknown can be determined or predicted if the value of another variable is known. This is all now possible without buying any expensive tools or discarding existing test assets.
Using Test Execution Data
Test execution results often contain valuable latent information about the product. Since teams usually do not retain test execution results, some precious insights and a bigger picture about the overall product health is lost.
Automated test frameworks often use libraries to publish the execution results and generate test reports. Instead, such static result documents can be stored in a centralized test results database. The metadata about the product release, test cycle or iteration, actual scripts, related stories or requirements, script name and identifier, execution date and timestamp, execution results with logs, or any other information generated can be stored in any workable database that fits the purpose.
Tests often also generate machine data like device vitals, events logs, and server parameters. All such information, which is usually already timestamped, can also be stored as measurements in an appropriate database.
Code change lists and commit-related metadata can also be stored centrally in a database. There are APIs available for most modern code management systems for this purpose. Information like which source file was changed, by whom, how many changes are in the file, and the file path can be included in such code change metadata.
Once all this information is available centrally, testers can use visual analytics tools to process the data and see what interesting patterns emerge. Predictive analytics models can also be designed and applied using R or Python on centralized test results data.
There are several useful things the automated test execution history can tell you:
1. Identifying and excluding flaky tests: Once analyzed in conjunction with source changes, test execution results can suggest which tests are flaky—those tests that pass and fail for the same configuration. Every test team has such flaky tests, and they consume a lot of execution, analysis, and maintenance time and lead to noise in the release pipeline. Time series analysis applied on past test execution results also can easily show which tests are flaky. Flaky tests can be excluded from the execution until they are fixed in the test harness.
2. Excluding unaffected subsystems: A product usually is composed of several subsystems that are often written in different languages. From the source change list in the execution history, you can identify which subsystem is stable and has not undergone any change in the present release scope so that tests pertaining to it can be skipped.
3. Code author’s impact: No two developers are equal, and some are more prone to introducing bugs. With the test execution data mapped to the source change history, you can empirically find out what to test according to the changes being committed by each individual. Also, when more than two people collaborate on code, or if a specific piece of code has undergone changes too many times, it usually contains more defects. You can see from the change lists what is in store for you.
Collating all the test execution results and mapping them with the source code change history using a relational database as well as a time series database holds the promise of discovering interesting patterns in your product health. Applying predictive regression analysis models on the data captured can further lead to the discovery of potentially defective areas of the product. It can also predict the gain or loss of product quality, impact of people changes on the release schedule, and how the product will behave with changes in the operating conditions.
All such observations will hold more value in the eyes of the business side, since they provide actionable insight about the product’s overall quality. This helps testers contribute business value and increase their own worth on the project team.