One of the key tenets of continuous integration is to reduce the time between a change being made and the discovery of defects within that change. “Fail fast” is the mantra we often use to communicate this tenet. This approach provides us with the benefit of allowing our development teams to quickly pinpoint the source of an issue compared to the old method of waiting weeks or months between a development phase and a test phase.
For this approach to work, however, our development and QA teams have to be able to run a consistent suite of automated tests regularly, and these tests must have sufficient coverage to ensure a high likelihood of catching the most critical bugs. If a test suite is too limited in scope, then it misses many important issues; a test suite that takes too long to run will increase the time between the introduction of a defect and our tester raising the issue. This is why we introduce and continue to drive automated testing in our agile environments.
I’ve observed a recurring set of three major factors that, when present, significantly increase the effectiveness of our tests in a continuous integration environment:
- Flexibility: Our tests must be able to be executed on demand at any time.
- Coverage: Our test coverage must be maximized with respect to the time available.
- Effectiveness: We must be able to catch the hard-to-pinpoint defects immediately.
When the concept of continuous integration and continuous testing were introduced to me some years ago, the discussion centered primarily around unit and functional testing. Our teams implemented unit tests into their code, and the test team wrote a small set of automated functional tests that could be run on demand. Performance tests, however, were still largely relegated to the back of the room until the project was nearly completed. We thought it necessary to wait until functional testing was almost over in order to get the level of quality “high enough” so that performance testing would run without (functional) issues.
Whether by experimentation, thoughtful foresight, suboptimal project schedules, or sheer luck, we found that pulling performance tests up so that they ran earlier and more often dramatically increased the value of those performance tests. Not only did we begin finding the really sticky, messy bugs earlier in the project, but our performance tests also provided a nice measure of augmentation to the functional tests we already had in place.
Looking at the three factors mentioned above, it is easy to see why. Performance test suites meet these three factors more often than not, and as such, they can be excellent candidates for running in a continuous fashion.
Flexibility: Performance Tests Are Automated
Performance tests are, by their very nature, almost always automated. They have to be because it is very difficult to drive large levels of load or volume using manual testing methods. Pressing the “submit” button on your mouse ten thousand times in succession is far more difficult and far less repeatable than submitting the same transaction via an automated test.
Because of this high degree of automation inherent in performance tests, they can be executed any time as needed, including off days and weekends. This flexibility allows our teams to run tests overnight on changes that are made late in the day, before the testers and developers arrive the next morning.
Coverage: Performance Tests Quickly Cover Broad Areas of Functionality
Performance tests generally provide “good enough” coverage of major functions without going too deep into the functionality. They cover a broad swath of commonly used functions in a short amount of time. If a functional bug exists in a major feature, it very often gets caught in the net of a performance test. Your performance tester is likely to be one of the first to begin screaming about a major bug in a functional feature. This is not to say that continuous performance testing can or should take the place of automated functional testing, but performance tests do, inherently, add a strong measure of functional validation.
You’ll want to be cautious to not allow your performance tests to become the de facto functional tests, as doing so can cause the team to lose focus on finding performance issues. When used together, however, functional and performance tests become effective partners in finding those bugs that otherwise bring your testing to a grinding halt.
Effectiveness: Performance Tests Catch Hard-to-Pinpoint Defects Immediately
Another important lesson I’ve learned managing performance test teams is that it’s rare for a performance issue to be caused by a code change that was made to intentionally impact performance. In other words, the majority of performance-related bugs occur in otherwise innocuous code. Quite often, we find that a defective change has a very minor performance impact when the lines of code are executed once, but when executed thousands or millions of times, they have a major cumulative slowing effect.
Consider the otherwise harmless line of code that, when changed, creates a performance delay of, say, only ten milliseconds per iteration. Now assume that the code iterates through that loop ten times per transaction. That ten-millisecond delay per loop is now compounded into a hundred-millisecond delay per transaction. If we multiply that one-tenth of a second delay by hundreds or even thousands of transactions per second, this tiny performance delay is now causing a major decrease in the number of transactions our system can process per second.
Now, let’s say the developer introduces this change on a Monday (with no intention of impacting performance either good or bad, of course) and moves on to a different area of code on Tuesday. Our test team begins performance testing two weeks later and the issue is caught at that time. By now, two weeks’ worth of development has occurred, and the developer who introduced the issue has changed his focus multiple times, working with four or five modules other than the one with the issue. To the developer, the code change that caused the issue might be considered so minor that he forgets he even made the change. When this issue is investigated two weeks after the bug was first introduced, our developer and tester will undergo a painful and time-consuming troubleshooting process in order to identify the root of the issue.
Consider the alternative scenario of the test team that runs continuous performance testing. This team executes the same set of performance tests every night and, therefore, would notice on Tuesday morning that Monday night’s tests are slower than the tests run over the weekend. Because the performance tests are run daily, the developers need only look back at Monday’s code changes to find the culprit.
The key here is that functional changes are generally prescriptive. By this, I mean that a functional code change makes the system behave differently by design and by intention. Performance changes, however, especially negative performance changes, are less likely to be prescriptive and more likely to be an unintentional side effect of an otherwise well-intended change.
Identifying and eliminating these unintentional side effects and figuring out why a system is slowing down becomes increasingly difficult as more time passes between the introduction of the issue and when our tester catches it. If the next performance test doesn’t occur for weeks or even months later, performing root cause analysis on the issue can become next to impossible. Catching these types of performance issues quickly is key to giving your developers the best chance of pinpointing the source of the bug and fixing it. Developers and testers alike will be able to spend less time searching for the proverbial needle in the haystack and more time focusing on getting the product ready for a quality release.
Scaling Up Your Performance Testing
If you don’t do performance testing, start now! Even basic performance tests can provide major benefits when run in a continuous fashion. Start with a single transaction, parameterize the test to accept a list of test inputs/data, and scale that transaction up using a free tool such as JMeter or The Grinder. Add additional transactions one at a time until you’ve got a good sampling of the most important transactions in your system. Today’s performance test tools are much easier to use than previous generations, and most basic tools today support features that were once considered advanced, such as parameterization, assertions (validation of system responses), distributed load generation, and basic reporting.
If you do performance testing, but only occasionally or at the end of a project, select a subsection of those tests and run them every day. Or, if constraints dictate otherwise (such as test environment availability), run them as often as you possibly can, even if that means running them weekly or less often. The key here is to pick up the repetitions and reduce the amount of time between repetitions, failing as fast as possible. Remember, the word “continuous” doesn’t have to mean “constant.”
Report the results of your continuous performance tests in a way that makes them accessible to everyone who needs them. I recommend a dashboard that provides an at-a-glance overview of the current state of your performance tests with the ability to drill down into more detailed results.
Most importantly, get your testers and developers involved and reviewing the results. Akin to the old adage of the tree falling in the woods, if your performance tests are screaming “fail, fail” but no one is listening, are your tests really making a sound at all?
Troubleshooting and fixing performance issues is difficult enough without having to wade through weeks or months of code changes to find the source of an issue. By closing the gap between the time a performance issue in introduced and the time we find it, we simplify the process of troubleshooting, eliminate a major source of frustration, and give our teams more time to work on the overall quality of our products. Because they contain a compelling mix of flexibility, coverage, and effectiveness, performance tests are very often powerful candidates for continuous testing.