In today’s world of continuous integration and testing, engineers are finding many ways to make all types of testing happen earlier in the development process. We are no longer limited to running only unit or functional tests when integrating quality activities into the CI process.
Performance tests are excellent candidates for inclusion in the continuous delivery process due to the value they can add in early detection of both functional and performance issues.
As performance test practices mature over time, a team’s test suite grows larger as coverage increases, and it can become a challenge to schedule and execute the full suite of tests within the constraints of a quick delivery cycle. It’s even more challenging to keep performance tests running with as little intervention as possible.
The model of automation we try to attain is called “hands-free,” and while it can be difficult to attain, hands-free automation can start, execute, report results, and clean up after itself, without intervention. In the real world, unfortunately, even the best automated tests struggle to attain (and retain) the “hands-free” state. No matter how mature your performance tests are, there is nearly always some significant amount of work needed to keep the tests running, the environments healthy, and the results relevant.
Over time, performance testers find themselves spending more time watching over their tests and diagnosing potential issues, leaving them with less time to mature and refine the test suite. Perhaps without even realizing it, testers find themselves in a cycle of running the same tests and seeing the same results day in and day out. With no time to think about improvements, the mind naturally switches to autopilot and begins to focus only on the red and green of a build report or dashboard.
An application must be actively and continually improved, or else its “fitness for purpose” will deteriorate. As an example of this law in action, how many of us are still carrying around a mobile phone from 2007, when the first iPhone was launched?
So if a system requires ongoing effort to remain fit for purpose, and a performance test team only has time to execute existing tests and can never apply this required work, there are only two possible outcomes. The first is that the tests become so obsolete that they begin to get ignored. Their practical value becomes so small that no one cares about them anymore, and the tests wither on the vine. The testers eventually get bored and move to other jobs, and the need for performance testing is forgotten by everyone, perhaps until customers begin to complain about poor performance a few months later. The second possible outcome is that testers continue to support the existing suite of tests, but a backlog of debt is accrued. Known as technical debt, this backlog represents the difference between the work that must be done to keep the tests current and the work that was actually done.
If you are responsible for performance testing and find yourself unable to keep up with the work that must be done to keep your test suite fit for purpose, it is important to at least identify and document the technical debt that is accruing. If you are unsure about the things you should be doing to control technical debt, here are a few questions that should be considered for existing performance test suites. Asking yourself these questions regularly will go a long way toward keeping your tests fit and sustainable and helping control a few common factors that lead to technical debt in performance tests.
What is the purpose of each test? Does this test still matter?
It may seem like if a test is running, it must be running for a reason, but this is not always the case. Over time, the system under test goes through changes, customer usage patterns evolve, and requirements mature. Tests, particularly those that fail less often, may go months or longer without being considered. The original need to run a given test may have long since changed, and without a proper and regular evaluation of each test’s purpose, the system under test and its requirements move on while the static set of tests becomes more of a time sink and less of a valuable tool.
In order to determine whether ongoing tests are still relevant, we recommend a recurring session during which the performance test team, the project manager or ScrumMaster, at least one developer, and the product owner or business owner come together to review the list of tests being executed. A deep dive of every test is not necessary; it is usually sufficient to perform a quick read-over of each test’s title and description to jog the team’s memory.
In this session, a decision should be made on each active test about its status: Is the test still relevant and should be continued, is the test no longer relevant and can be removed from the suite, or is the test somewhat relevant but needs to be adapted due to changed conditions?
After all existing tests are evaluated, the question of missing tests should also be addressed. A great time to consider new or missing tests is while the existing tests are still in the front of the team’s mind. By asking the team, “Are there any tests we should be executing that are not being executed today?” the test suite remains current and relevant.
By ensuring that the right tests are being executed and by regularly reviewing the suite for missing tests, the performance test team can address test suite discrepancies as a routine part of their job and avoid running spurious tests for extended periods of time. The test suite grows and evolves alongside the application as the right amount of work is applied at the right time, rather than building up a mass of debt.
Are the metrics and logs we are gathering trying to send a message that we’re not seeing? Are all my metrics, logs, and artifacts still being gathered?
Properly executed performance testing is an empirical activity made possible by the collection and analysis of many measurable data points. Unlike many functional tests, it is not always easy or even possible to determine if a performance test result is a “pass” or “fail” based on a single measure. For example, a system that responds quickly but displays increasing memory utilization over time should receive further attention, as it could indicate an unhealthy condition. Without measuring memory utilization and many other vital signs of system performance, these types of indicators can easily be missed.
To validate that your metrics collection is happening, we recommend allocating a small amount of time on a recurring basis to perform a quick review of what is being collected, even if the test assertions pass. For example, immediately after an automated set of regression-type performance tests completes is a perfect time to check that 1) the primary metrics, such as transaction per second, files per second, etc., are being stored in an accessible place to be able to review them whenever is necessary; 2) the secondary metrics, such as CPU, memory, network, and disk utilization, are accessible; and 3) any data extracted from logs (as well as the original logs, if needed for audit requirements) are available.
If your log analysis is not automated, it is important for the tester to regularly search any important logs (via grep, for example) for warnings or error patterns not seen previously. Log warnings and errors are often precursors to problems that can manifest as performance issues later on.
You don’t want to find out after a customer reports an issue in production that your secondary performance metrics stopped being gathered weeks ago because a password expired or due to another mundane yet common issue. Because these types of metrics are often not reviewed with every run, it is easy to become complacent and assume they look fine—or that they are being gathered at all.
Finding early warning signs in logs as soon as they are introduced reduces the likelihood of the causes of performance issues going undetected for extended periods of time. Performance issues are often the most difficult type of issue to resolve, so catching them when their symptoms first present can vastly reduce the amount of work needed to resolve them, as well as the amount of debt that gets accrued.
Are my tests running optimally, or can they be improved?
If you’re like most performance test organizations, you have limited resources in your test lab—either in number of machines or in processing capability. It is important that your test suite makes the best use of these limited resources.
Over time, tests can slow down, and the impact across many tests can be quite large. When we talk about slow tests, we’re not talking about performance results but, rather, tests that take longer to execute than in the past. There are many reasons tests may slow down over time. Perhaps the load generator framework was updated with a component that takes much longer to start, or perhaps the virtual machine on which the test runner is hosted is sharing a physical host with many more VMs. Whatever the reason for the slowdown, these tests not only take up more resource time, they also can make it difficult to maintain a reasonably fast continuous integration process. If you do find tests that have slowed down over time, you will want to ask, “How much of the test duration is actual test time?”
In today’s world of increased automation, a performance test may have a scope much larger than simply running a load model against a system. The automation may include steps before the actual test execution, such as test data setup, product installation, and environment configuration. After test execution is finished, test automation may uninstall the product, clean up after itself, report results to a repository, and archive important run data. The performance of any one of these steps can be influenced by factors outside the immediate purview of the test team and, as such, should be assessed regularly so that any significant change in test duration can be eliminated.
By measuring the duration of each test, including setup and teardown steps, tracking these trends over time, and addressing slowdowns immediately, performance testers deal with the causes of slowdowns immediately rather than waiting until the sum of all the slowdowns becomes so large that the completion time of the entire suite is no longer acceptable. This negates technical debt that otherwise will need to be untangled (at much greater cost) later on.
Asking yourself and your team these questions regularly will help keep everyone on the lookout for potential technical debt and will give you time to either perform the necessary work right away or, if constraints dictate, at least be able to track the need to perform the item in the future. Just like any other backlogged work, technical debt should be tracked and prioritized according to cost, impact, and risk. These items should also be part of any regular requirements or user story grooming that happens as part of your agile processes. Keeping your technical debt alongside the project’s requirements keeps these items in the front of your project team members’ minds and increases the likelihood that they will be addressed.