To cover your bases when testing performance, you may try writing a "performance equation" so you can check each factor. But the individual pieces do not always equal the whole picture. It can be easy to overlook performance-affecting components. Testing is much more about the discovery of systems’ behaviors than verification of a few samples of expected behavior.
Many teams decide to put together a “test bed” of servers and network infrastructure, develop some scripts simulating user requests, run the whole thing against the application, and see if they can satisfy the business requirements. And to make sure they have extra capacity, they double the number of transactions.
But this approach only seems to be working sometimes, meaning it’s failing the other times. And explaining why it works or fails is equally hard.
Let’s take a closer look at the parts constituting the “performance equation”:
- Env stands for test environment (the network, switches, load balancers, servers, database, and platform software)
- WL stands for workload scripts simulating user activity
- Code stands for application code, including in-house components and third-party libraries
- And X is the set of performance characteristics, such as response time, throughput, etc.
The testing environment’s hardware and software have a multi-faceted impact on the application’s performance. If it’s financially challenging to replicate the production environment, the closest available environment is used for performance testing. Often that might be a user acceptance testing or preproduction environment. The typical “challenge fork” is that using a test environment that is less powerful than the actual production environment might be a cause for many unrealistic faults, while using a test environment that is more powerful than the production environment will mask the real performance problems.
While the risks of testing on significantly different hardware are readily seen, other differences might not be so obvious. Server settings, memory allocation, and process priorities are the first to consider. Application settings may also result in performance variation. For example, debugging or logging levels may significantly change the frequency and volume of file operations. Database structure and complexity of the contents may have a significant performance impact on functions like search and reporting. To address that risk, you can use a masked copy of production database.
It may take a few rounds of testing to uncover and remediate hidden factors of the environment component.
Workload models are often based on service-level agreements and business requirements. Some common examples include the number of transactions, number of concurrent users, and response times. A typical requirement might look like the following:
- Allow concurrent work of 10 percent of the total user base of 2,500 accounts
- Support performing 2,500 transactions per hour
- Page loading time should be no longer than five to ten seconds
- The system should stay up for a period of eight hours
Such requirements seem very straightforward, so they might be used as workload models. For example, based on the requirements given above, you may need to simulate two hundred fifty concurrent users doing ten transactions per hour each, for eight hours. While doing so, also measure page loading time. And yes, to test the extra capacity, let’s do twenty transactions per hour, or one every three minutes.
Are we missing something with this straight model?
To gain another perspective, let’s look at the application traffic as road traffic instead. A segment of road that’s a hundred yards long can be concurrently occupied by ten vehicles. But notice the difference in speed when there are just a few cars, or when they’re driving bumper to bumper. Also notice that simultaneously on any given section there might be only as many cars as lanes. And if number of lanes vary, it’s the sections with the least number of simultaneously bypassing cars that will greatly affect the speed of the traffic.
You probably see where this is going. Distributing the number of transactions evenly through a period of time takes into account concurrency of users but not simultaneously happening transactions. While one may argue that it’s unlikely that all two hundred fifty users will simultaneously press “Submit,” what are the chances that ten would do so? Or even three? This is where we’d most likely encounter a bottleneck.
And what about the total number of concurrent users? Most load testing tools and services have a pricing model based on the number of virtual users, and many companies don’t own a license for two hundred fifty virtual users. A typical workaround is to double or triple the speed of scripts to make up for the number of transactions. While it’s mathematically correct, this approach doesn’t take into account that the number of user sessions has nonlinear effects on the software system. Each user session requires allocation of resources (to the application’s memory, the server, and the database), and resource management has a performance tax. Thus, running the scripts three times faster doesn’t really make up for the equivalent number of users.
In our performance equation, if we assume the code part is the only variable, then any performance change should mean that the latest code drop has something to do with it.
Some code changes may indeed have a dramatic performance impact. But very often, small changes here and there accumulate for some time before performance problems become apparent. Unless you have a trending graph of performance benchmarks for each build, any slow decline might be hard to notice. Even though testing may reveal performance degradation, resolving the problem is challenged by an absence of a single root cause.
Constant workload models are intended to establish a baseline for tracking. They execute the same scripts in the same load fashion. But that also means exercising the same application functions in the same way. Real users perform a variety of operations. Along with transaction booking, they may search, edit, remove data, and run reports. All those functionalities could stay outside the generated load but have a real impact on performance if used in parallel with it.
There also are newly deployed features in the same functions, such as account lookup, editing the shopping cart, and so on. If the scripts submit transactions in the same fashion, they’re not addressing many of the code changes.
Solving for X
All testing is a blend of activities: learning, modeling, experimenting, and investigating. Testing is much more about the discovery of systems’ behaviors than verification of a few samples of expected behavior. Load and performance testing is no exception. Only through iterations of trial and error we can learn about the system and develop useful workload models.
Look for hidden parts in your performance equation—especially if it looks simple to solve.