The Power of Continuous Performance Testing

[article]
Summary:
Continuous performance testing gives your development teams a fighting chance against hard-to-diagnose performance and load-handling bugs, as well as quickly identifying major functional bugs. Due to its combination of flexibility, coverage, and effectiveness, performance tests are powerful candidates for continuous testing.

One of the key tenets of continuous integration is to reduce the time between a change being made and the discovery of defects within that change. “Fail fast” is the mantra we often use to communicate this tenet. This approach provides us with the benefit of allowing our development teams to quickly pinpoint the source of an issue compared to the old method of waiting weeks or months between a development phase and a test phase.

For this approach to work, however, our development and QA teams have to be able to run a consistent suite of automated tests regularly, and these tests must have sufficient coverage to ensure a high likelihood of catching the most critical bugs. If a test suite is too limited in scope, then it misses many important issues; a test suite that takes too long to run will increase the time between the introduction of a defect and our tester raising the issue. This is why we introduce and continue to drive automated testing in our agile environments.

I’ve observed a recurring set of three major factors that, when present, significantly increase the effectiveness of our tests in a continuous integration environment:

  • Flexibility: Our tests must be able to be executed on demand at any time.
  • Coverage: Our test coverage must be maximized with respect to the time available.
  • Effectiveness: We must be able to catch the hard-to-pinpoint defects immediately.

When the concept of continuous integration and continuous testing were introduced to me some years ago, the discussion centered primarily around unit and functional testing. Our teams implemented unit tests into their code, and the test team wrote a small set of automated functional tests that could be run on demand. Performance tests, however, were still largely relegated to the back of the room until the project was nearly completed. We thought it necessary to wait until functional testing was almost over in order to get the level of quality “high enough” so that performance testing would run without (functional) issues.

Whether by experimentation, thoughtful foresight, suboptimal project schedules, or sheer luck, we found that pulling performance tests up so that they ran earlier and more often dramatically increased the value of those performance tests. Not only did we begin finding the really sticky, messy bugs earlier in the project, but our performance tests also provided a nice measure of augmentation to the functional tests we already had in place.

Looking at the three factors mentioned above, it is easy to see why. Performance test suites meet these three factors more often than not, and as such, they can be excellent candidates for running in a continuous fashion.

Flexibility: Performance Tests Are Automated

Performance tests are, by their very nature, almost always automated. They have to be because it is very difficult to drive large levels of load or volume using manual testing methods. Pressing the “submit” button on your mouse ten thousand times in succession is far more difficult and far less repeatable than submitting the same transaction via an automated test.

Because of this high degree of automation inherent in performance tests, they can be executed any time as needed, including off days and weekends. This flexibility allows our teams to run tests overnight on changes that are made late in the day, before the testers and developers arrive the next morning.

Coverage: Performance Tests Quickly Cover Broad Areas of Functionality

Performance tests generally provide “good enough” coverage of major functions without going too deep into the functionality. They cover a broad swath of commonly used functions in a short amount of time. If a functional bug exists in a major feature, it very often gets caught in the net of a performance test. Your performance tester is likely to be one of the first to begin screaming about a major bug in a functional feature. This is not to say that continuous performance testing can or should take the place of automated functional testing, but performance tests do, inherently, add a strong measure of functional validation.

You’ll want to be cautious to not allow your performance tests to become the de facto functional tests, as doing so can cause the team to lose focus on finding performance issues. When used together, however, functional and performance tests become effective partners in finding those bugs that otherwise bring your testing to a grinding halt.

Effectiveness: Performance Tests Catch Hard-to-Pinpoint Defects Immediately

Another important lesson I’ve learned managing performance test teams is that it’s rare for a performance issue to be caused by a code change that was made to intentionally impact performance. In other words, the majority of performance-related bugs occur in otherwise innocuous code. Quite often, we find that a defective change has a very minor performance impact when the lines of code are executed once, but when executed thousands or millions of times, they have a major cumulative slowing effect.

Consider the otherwise harmless line of code that, when changed, creates a performance delay of, say, only ten milliseconds per iteration. Now assume that the code iterates through that loop ten times per transaction. That ten-millisecond delay per loop is now compounded into a hundred-millisecond delay per transaction. If we multiply that one-tenth of a second delay by hundreds or even thousands of transactions per second, this tiny performance delay is now causing a major decrease in the number of transactions our system can process per second.

Now, let’s say the developer introduces this change on a Monday (with no intention of impacting performance either good or bad, of course) and moves on to a different area of code on Tuesday. Our test team begins performance testing two weeks later and the issue is caught at that time. By now, two weeks’ worth of development has occurred, and the developer who introduced the issue has changed his focus multiple times, working with four or five modules other than the one with the issue. To the developer, the code change that caused the issue might be considered so minor that he forgets he even made the change. When this issue is investigated two weeks after the bug was first introduced, our developer and tester will undergo a painful and time-consuming troubleshooting process in order to identify the root of the issue.

Consider the alternative scenario of the test team that runs continuous performance testing. This team executes the same set of performance tests every night and, therefore, would notice on Tuesday morning that Monday night’s tests are slower than the tests run over the weekend. Because the performance tests are run daily, the developers need only look back at Monday’s code changes to find the culprit.

The key here is that functional changes are generally prescriptive. By this, I mean that a functional code change makes the system behave differently by design and by intention. Performance changes, however, especially negative performance changes, are less likely to be prescriptive and more likely to be an unintentional side effect of an otherwise well-intended change.

Identifying and eliminating these unintentional side effects and figuring out why a system is slowing down becomes increasingly difficult as more time passes between the introduction of the issue and when our tester catches it. If the next performance test doesn’t occur for weeks or even months later, performing root cause analysis on the issue can become next to impossible. Catching these types of performance issues quickly is key to giving your developers the best chance of pinpointing the source of the bug and fixing it. Developers and testers alike will be able to spend less time searching for the proverbial needle in the haystack and more time focusing on getting the product ready for a quality release.

Scaling Up Your Performance Testing

If you don’t do performance testing, start now! Even basic performance tests can provide major benefits when run in a continuous fashion. Start with a single transaction, parameterize the test to accept a list of test inputs/data, and scale that transaction up using a free tool such as JMeter or The Grinder. Add additional transactions one at a time until you’ve got a good sampling of the most important transactions in your system. Today’s performance test tools are much easier to use than previous generations, and most basic tools today support features that were once considered advanced, such as parameterization, assertions (validation of system responses), distributed load generation, and basic reporting.

If you do performance testing, but only occasionally or at the end of a project, select a subsection of those tests and run them every day. Or, if constraints dictate otherwise (such as test environment availability), run them as often as you possibly can, even if that means running them weekly or less often. The key here is to pick up the repetitions and reduce the amount of time between repetitions, failing as fast as possible. Remember, the word “continuous” doesn’t have to mean “constant.”

Report the results of your continuous performance tests in a way that makes them accessible to everyone who needs them. I recommend a dashboard that provides an at-a-glance overview of the current state of your performance tests with the ability to drill down into more detailed results.

Most importantly, get your testers and developers involved and reviewing the results. Akin to the old adage of the tree falling in the woods, if your performance tests are screaming “fail, fail” but no one is listening, are your tests really making a sound at all?

Conclusion

Troubleshooting and fixing performance issues is difficult enough without having to wade through weeks or months of code changes to find the source of an issue. By closing the gap between the time a performance issue in introduced and the time we find it, we simplify the process of troubleshooting, eliminate a major source of frustration, and give our teams more time to work on the overall quality of our products. Because they contain a compelling mix of flexibility, coverage, and effectiveness, performance tests are very often powerful candidates for continuous testing.

User Comments

13 comments
Grumpy Pants's picture

Thank you for writing a great article. During your nightly performance tests do you baseline with a single user or actually perform a load. And if you do a load how much and why? :) Thanks!

September 1, 2015 - 10:02am
Don Prather's picture

Hi GP! Good question. I recommend running as much load as the system can handle up to and including the amount of load expected during peak periods one the system is in production. In practice, the feasibilitiy of this approach will depend on the current state of your system in the test environment and the current robustness in your tests. Here is what I mean. If you are very early in development and you know that the system will not handle more than a nominal level of lead without failing ungracefully, then it probably doesn't add much value to run excessive load only to have the test fail repeatedly. In this case, (after making sure my team had a defect raised) I would run the amount of load for which i was confident that the system would reasonably handle. This provides the benefit of being able to detect regressions versus the current state of the system and still providing some functional coverage of those areas covered by your performance tests. Adding some robustness to your tests can help you stretch the limits of what you run during the evening, particularly if the test can clean up after a failed test and doesn't get stuck the first time it fails. In addition, it pays to be adaptive. For example, weekends are great times for running 48 hour endurance tests which start on Friday afternoon and end on Sunday afternoon. 

To summarize, I definitely recommend running some load greater than a single user during nightly tests. 

September 1, 2015 - 1:01pm
Marc  Rice's picture

Hi,

This is an interesting article. Some questions I have are around the scripting part of the performance testing. Best practice usually dictates that performance testing is done on the final code that is expected to go live, I understand the approach detailed above is different, how do you deal with changes in the code that impact the performance test scripts that you use?

I currently use Performanc Center/LoadRunner and we have numerous occasions where a code change requires us to re-record an application and then re-create the script. This obvioulsy takes time and would make it difficult to run a performance test the night of code delivery.

I'm interested to hear your thoughts on this.

Thanks,

Marc.

September 2, 2015 - 4:09am
Don Prather's picture

Hi Marc,

This is a good question and it is true that running performance tests against a new or rapidly changing application can be quite difficult. In this situation, I would utilize some of the same techniques I would use if I were creating functional tests in a CI environment. First, I recommend specifying and writing your performance tests as early as possible in the lifecycle. Second, and as important, is to share these specifications and tests with your developers so they understand what the tests do and how they do it. Often, developers who are aware of your tests can make changes in such a way that they can satisfy the requirements for which they are coding while, at the same time, not breaking your tests and maintaining good testability. For example, a developer who understands that your performance test uses the ID tags of a collection of HTML objects can make his or her changes without impacting these tags or, at least, notifying you when the tags do need to change. While this is a simple example, there are many ways in which developers, aware of the way your tets run, can keep one eye on the business requirements and another on the testability of the application, making changes that minimize or even eliminate your need to rerecord and recode your scripts.

With that said, a situation in which an application is very new or one which is undergoing significant changes causing the performance tests to fail (functionally) most nights may indicate that the AUT is not quite ready to have performance tests executed against it. If this is the case, I recommend continuing to communicate with your development team the details of your tests and to attempt a run every now and then so the team understands where the quality of the code is in relation to the functionality covered by the performance tests. The goal here is to be able to identify performance issues as early in the lifecycle as possible, giving your development teams the highest chance of fixing them before the application is released to customers.

 

Thanks!

-Don

September 2, 2015 - 1:52pm
Richard Friedman's picture

Don, great article.   I very much believe in integrating performance into the process, but for continuous I like to keep it simple, small, and cheap.  Even a small load test with hitting dev instances can reveal performance issues.  At some point, hopefully via iterative development lifecycle you can run larger tests.  

Specific to the question above, if teams are checking in code with unit or even better functional could we integrate that directly into the performance testing?   I am JMeter fan, but also believe in using the right tool for the right job.   Twitter's IAGO had an interesting idea of using transaction logs as the replay mechanism. https://github.com/twitter/iago  Have you tried anything?

I think my writing has paralleled yours as well. https://www.redline13.com/blog/recent-posts/

 

September 3, 2015 - 7:50pm
Don Prather's picture

Hi Richard,

I am definitely a fan of re-using or leveraging existing tests as a foundation for performance tests. I can see this approach working better using functional tests rather than unit tests as unit tests tend to be a bit more targeted to one specific area of code where functional tests are generally a bit broader in scope and closer to the business-type of transactions we use for performance tests. In fact, I might even go as far as to say that you can take a functional test, parameterize it so it has a large and diverse range of data to utilize, create a mechanism to iterate through this data and control the arrival rate of the transactions, and you have a very good foundation for a performance test. In practice, this is much more difficult to accomplish than in theory but perhaps it serves as a model to which we could strive. 

I checked out Twitter's IAGO and I am intrigued. One of the challenges of performance testing is figuring out how to create the size and diversity of test data needed to run performance tests and using transaction logs could help solve this problem. I will have to do some more reading on this tool.

Thanks!

-Don

September 8, 2015 - 3:31pm
Muruga das's picture

Nice Article, automate the performance tests and execute them regularly will give an overview about the product performance also will helpful if any performance degradation happened in the recent days.

September 16, 2015 - 7:39am
Daniel Moll's picture

Hi Don,

Thank for the interesting read! At my company, we are experimenting with continuous performance testing using a framework build on a number of open source components, like Gatling and Graphite. I have build a dashboard application that can be used to organizing, analysis and automated benchmarking of test results. It is available on github: https://github.com/dmoll1974/targets-io. I still need to write the documentation but I have set up een demo environment with Docker Compose so you can check out the features. I'd love to hear your feedback if you find the time to check it out.

Cheers,

Daniel

November 28, 2015 - 3:09pm
K Y's picture

Hi Don. Questions. As a property of testing early, not all features are developed. What information can you get out of performance test that do not run with close to expected production load and functionality? How do you make use of this information?

One way I see, is to build a capacity model out of a measurement of the amount of system resource each functionality consumes. From this capcity model we could judge if one functionality is consuming too much resources and potentially not leave enough resources for other functionality that needs to run concurrently. Whats your opinion on this approach?

January 11, 2016 - 11:38am

Pages

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.