For Great Performance, Rethink Your Load Testing

[article]
Summary:

The word concurrency is often used to define workload for load testing, as in concurrent users. Too often, it's the only input defined. In reality, there are a number of factors that contribute to workload and affect concurrency, and they all contribute to your load testing abilities—and, ultimately, the performance of your product.

The word concurrency is often used to define workload for load testing, as in concurrent users. Too often, it's the only input defined. In reality, there are a number of factors that contribute to workload and affect concurrency, and they all contribute to your load testing abilities—and, ultimately, the performance of your product.

Getting an Accurate Read on Concurrent Users

A common method for calculating target concurrent users is dividing the number of unique users by the average visit duration for a given period. For example, 12,000 unique visitors per hour who spend fifteen minutes using your product per visit would equal 3,000 concurrent users.

However, there are several problems with this approach:

  • The assumption that visitors are spread evenly across the one-hour period sampled is unlikely. Visitors are more likely to follow some form of Poisson process.
  • The number of unique visitors may be miscalculated. Depending on the method used in sampling, this can often be misleading, particularly if using things like source IPs, which can often be shared or proxied by real users.
  • Concurrency contributes to, but does not define, overall workload. What were those users actually doing? Were they idle or busy?

More calculated methods might be to profile or trace real users over different periods or to use descriptive statistics aside from the averages above to show a frequency distribution or box plot so you can better model concurrent use.

It’s important to get concurrency right because, ultimately, the workload is most likely being processed by some form of queuing system, and concurrency can affect arrival time on those queues.

In real-life computer systems, distribution is rarely uniform, so it is good to try to observe, understand, and model concurrency within your system under test.

Starting a Load Test with Concurrent Users

Assuming you've created a model (good or bad), it's now time to turn to your favorite load testing tool and simulate that model under load. One popular option is Apache JMeter, an open source Java application that analyzes and measures performance.

At the heart of JMeter is a thread group, which lets you define number of users, either the number of times to execute the test or a fixed duration for the test, and a ramp-up period, which will delay the start of individual threads. For example, 3,000 concurrent users with a ramp-up of 300 seconds will impose a ten-second delay between the start of each user.

Bear in mind that this provides a linear, uniform distribution for starting threads, so unless your objectives are to measure performance once all users have started up (and ignore startup itself), then this method is unlikely to simulate a realistic load profile.

The popular JMeter plugins library provides an alternative called the Ultimate Thread Group, which, as its name implies, gives you more options for starting concurrent users. This includes variable quantities of threads, ramp-up times, and duration. With a little patience, you can plug the information you want into the UI and create more realistic concurrent load profiles.

For what it’s worth, I often find many performance-related defects during ramp-up, especially with more realistic models for starting users. This period is often discarded from test results, but it’s important for monitoring and measuring performance.

Simulating Throughput and Using Randomization

Now that your users have started and the test is under way, many testers will focus on this period of the test and use it as the basis for further observations and results. Some testers may refer to it as peak concurrent load or steady state load.

A key component aside from concurrency that will affect this period is throughput.

Throughput can be measured in many different ways, such as network throughput or number of requests per second. But ultimately, throughput is created by users performing some action on the system under test. As I mentioned earlier, a high concurrent user test might not mean much if the majority of users are just idle, so throughput is just as important to model and effectively simulate.

Throughput is most often impacted by things such as think time (time between user transactions) or pacing (time between each iteration), but it is also affected by the service time in any of the system queues, such as web app response time or network latency and bandwidth.

A Poisson process randomizes the time between each pair of consecutive events, assuming each of these interarrival times is independent of other interarrival times. To simulate that, JMeter has a decent set of timers that can be used to help randomize the interarrival times of user actions. I like to use the Guassian Random Timer to randomly delay each request around a constant delay offset according to Gaussian curve distribution.

JMeter also has a Test Action controller I like to use to add a variable pause at the end of each iteration. This can help determine the pacing, or the rate at which each user goes through a list of transactions. Still, pacing between iterations can be difficult to get right, especially if you're aiming to simulate average visit duration, for example.

I believe the best way to get accurate simulations is to shake out your scripts manually with a single user and multiple iterations, so you get an understanding of timing between iterations.

Frequency of execution for particular code blocks can be controlled with the Throughput Controller. I prefer to use percent execution so that I can weight a certain percentage of the iterations through the test plan.

The Throughput Timer can also be useful as it introduces variable pauses, calculated to keep the total throughput in terms of samples per minute as close as possible to a given figure, which itself can be random.

Translating samples per minute, as JMeter executed them, to your model's throughput targets can be difficult. A certain amount of effort and understanding is required for accurate models.

Creating a Holistic Load Testing Process

Before you get too excited about the massive concurrency numbers you can generate with JMeter or any other popular load testing tool, have a think about what concurrency actually means in your test plan. It is possible to have 100,000 users sitting idle or 1,000 users iterating faster than a thousand startled gazelles, effectively applying the same workload on the system under test. So, concurrency as a single measure lacks context.

Because concurrency on its own is not enough to define system workload, other important metrics you should look at include throughput and errors. You can view throughput by network or requests, and I generally prefer the latter, as it ties in well with external monitoring tools.

Monitoring for errors from the client side is extremely important to help track down potential bottlenecks on the server side. For example, you should look for response codes other than HTTP 200, and view failed transaction counts.

Beware of averages and uniform distributions. Default settings in the majority of commercial and open source testing tools are normally not enough to simulate a realistic workload model.

The starting or ramping up of users in your test plan is just as important as "steady state" load. Don't discard the results!

Finally, always look for relationships between different metrics such as concurrency, throughput, and response time. Never ignore errors. The performance of your product depends on it.

User Comments

2 comments
Xander Bartels's picture

Good article

I'm also testing "performance" for a long time, but the definition is a difficult one.

  • performance validation. Has the new release an equal, better or acceptable decrease in performance under the same condition
  • Only change one parameter at the time
  • environment performance benchmarking, e.g. the maximum capacity and scale-up and down testing. (making good rules)
  • I'm still struggeling too to generate a correct load  profile, but we often take the worst possible user journey (API requests) into account 
  • Try to model it and see if it fits the production metrics, I don't think you need a 100 % correct model, but if it is correct for 80 %, you can already create good capacity plans etc

Cheers,

 

October 16, 2017 - 7:01am

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.