Almost no one has built load testing correctly. Worse yet, our collective mental models seem to be damaged. I know these claims sound preposterous on the surface, but I hope by the time I’m done, I will convince you they’re true. As someone who has built my own load test systems, I was taken aback when I discovered how flawed my thinking was. However, before I demonstrate the damage, let me start with a quick walk-through of how I think our collective model works.
We start with an environment that handles some sort of concurrent usage—perhaps a website or an API. Depending on exactly what type of load testing is being done, this might vary a little, but in general we have a static set of users who use the system over and over again. Some load test systems call these “threads” or “virtual users.” Each user generates a request, waits for a response, and then makes more requests until the test script is completed.
To generate load, the system will run the test script over and over until some “done” condition is hit. The load test script may have some customization, such as the users varying their actions a little or pausing between requests. The load may have several different testing purposes, like memory constraints or whether the system will remain stable over a long period of time. Once the test has been completed, you might examine data such as the average response and the fiftieth percentile.
While I don’t think this model of load testing is broken for all possible load test goals, it certainly is for the vast majority of scenarios I have worked on. Have you found any issues yet?
Traffic in the Real World versus a Test
Stepping back from computers for a moment, imagine you are in a car, driving on the highway. You’re cruising at the legal speed limit, and there are about nine cars nearby. We might say there are ten users of this particular stretch of highway at this particular time. We could claim that in normal conditions, the highway can process ten cars per second on the given stretch of road. Then an accident happens. You go from a four-lane highway to a single lane. You and the other cars nearby can no longer move at ten cars per second. Traffic is making it difficult to navigate, so you slow down to half the legal speed limit. Assuming that it takes an hour for the wreck to be cleared away, what do you think will happen?
In the real world, you would see a traffic jam. They might close down the highway to allow workers to clear the wreck. This certainly is going to make travel slow. Assuming that traffic remained constant, ten additional cars would appear at the traffic jam site every second, while fewer would leave.
If our load test model were parallel to the real world, it too would keep generating traffic in spite of the traffic jam. However, our load test only has a maximum of ten users it can test with. That means the system would never see the load it would in a real-world scenario. Your ten users would be going at half-speed, for sure, but the analogous additional cars would never appear. You would never see a traffic jam. You couldn’t; you only have a maximum of ten users. In a sense, your load test backs off the system, letting the system recover more quickly. After all, there’s no need to shut down the highway if you only have ten cars waiting. Azul Systems cofounder and CTO Gil Tene coined the term coordinated omission to describe the problem.
Lying to Ourselves with Data
Worse yet, the statistics you gather would be completely wrong. Imagine you have a load test running for a nice round one hundred seconds, and for each second you have a hundred transactions from a single user. That means you should see one transaction per ten milliseconds, or ten thousand transactions by the time the test is done.