Why Is Estimating Software Testing Time So Difficult?

Management loves to ask testers to estimate how long their efforts will take. But so many important aspects elude measurement that testing time is difficult to predict. Here are some of the major factors that significantly influence our ability to estimate testing time well, along with some advice on how you can tighten up your efforts.

I recently got this email from my friend Carol:

I need a fairly scientific way to estimate testing time. Today, I know how long my test cases take to run individually, I know there will be some number of bugs, I know the fixes will take some period of time. I know I will need to rerun tests, etc. Is there a formula that helps with estimating this? I realize it will not be exact, but something that other companies do to make estimating more of a science than a feeling. I hope you have an exact answer for this question. My boss is going to ask me for this information on Monday, so no pressure but HELP!

Carol asks an important question. Management tends to think of software development as an investment, like buying a car or a house. Like those big-ticket purchases, there are plenty of other options to choose from, so management tends to like to know the benefits, the time to build, and the cost.

Those seem like reasonable requests, at least at first. Then we run into Carol’s questions, which make things more challenging. Sadly, the reality is that our guesses at how long tests take to run are often wrong, we likely don't get to put in forty hours of productive testing in any given week, we are often waiting for new builds and fixes, and the rate of failure means we’ll need to rerun tests, often more than once.

This means a terrible amount of uncertainty in the estimating process. Add a new team, technology, or process, and suddenly the uncertainty is over the edge; coming up with a schedule estimate for testing starts to feel less like science and more like an irresponsible guess.

The following factors significantly influence our ability to estimate testing time well, but with a little effort, you can tighten up the process.

In her email, Carol indicates her lack of knowledge about her situation. She states that fixes will take “some period of time” and there will be “some number of bugs.” Like Carol, most organizations lack sufficient historical data to build estimates from. Without the data of experience, it will be difficult to create accurate estimates.

So start gathering data! Over the next two weeks or so, try to figure out what percentage of your time is spent on rework. If it’s 30 percent, use 70/30 to find that the planned test effort should be multiplied by 1.43 to find the real effort.

The next key factor is the test team itself. How large is the team? What is each member’s personal level of skill and experience? Do they have a well-defined testing process that everyoe understands and can select from? How stable is the team? Do members come and go randomly, or do they have a cohesive history? How much time can the team focus on testing tasks without interruption? And what are the individuals’ interaction skills? These answers are all vital to the team’s performance and, thus, the estimates for testing time, but we have no ways of measuring these vital characteristics. Lacking this, ask yourself how much final schedules differ from the planned ones and how that is changing. If it’s getting worse, you need more time. If it varies, use the last example.

Another factor in good estimates is the stability of the requirements. We no longer “freeze the requirements” like we used to. In today’s agile world we welcome change, and with those changes to requirements will come changes in testing—and the estimates. Product owners flex scope to hit dates; take a look at flexing testing to hit deadlines.

System size, complexity, and risk are also key factors that influence the amount of testing that “should” be performed. And again, we have no effective ways of measuring these factors. In his book The Principles of Product Development Flow, Donald Reinertsen says larger projects slip not only by larger amounts, but also by larger percentages. When you look at how far your estimates are off, look at projects that looked to be of similar size at the beginning.

A key factor in estimating the testing effort (and other unknowns) is the defect density in the requirements, design, and code. Buggy requirements and design will result in buggy code. How bad will it be? How many defects will be delivered to testers? That factor has a substantial impact on the amount of time testing will require. Again, looking at similar projects can help here.

In her email, Carol mentioned that developer fixes will require some time. Another influencing factor is the developer “screw-up rate” when fixing defects. The general feeling is that about 5 percent of the “fixes” either will not fix the original problem or will break something else in the product. But what is the ratio at Carol’s organization? We don’t know, but it would help to find out.

Another factor that must be taken into account in test estimation is the required thoroughness, or coverage, of the testing. Is this a cribbage game app in which minor errors might be acceptable, or a drug infusion system where errors can be deadly? Does the system have zillions of paths that each require a unique test, or does the system have myriad combinations of data it must process correctly each time? This can be incredibly hard to calculate, but here’s an idea: Get a handle on what management expects for a schedule, what you can cover in that time, and what would be left uncovered. See if they find that acceptable—or if that inspires them to give you more time. (Another way to do it: Explain what kind of coverage you could generate while still keeping up with the programmers.)

The availability and reuse of previous test assets and environments can significantly change the time required to test. Unfortunately, there are no generally accepted ways to measure test reusability to factor it into the estimation process. If reuse is low, remember that test design and brainstorming aren't free. Even teams that do session-based test management and try to push design into the work need to come up with the charters for the work.

Lastly, good test estimation is just plain hard work. Software developer Joel Spolsky’s evidence-based scheduling method has four steps: (1) Break the planned testing tasks down into small chunks (without omitting any important ones), (2) Track the actual elapsed time, (3) Simulate the future using the Monte Carlo method, and (4) Manage your project actively. He claims good success with test estimation using this method, but who really wants to go to all that work? Not many companies I know of.

It’s no wonder that test estimation is so difficult. There are so many important factors that elude our measurement. And even if we knew most of them, a single unknown could skew the chart. In many cases, the deadline is given to us. It makes me wonder if perhaps we would be better off investing the time we would spend estimating in doing actual testing instead.

User Comments

Mike Talks's picture

Like you said, I'm a huge fan of sizing by comparison.  Finding something of similar complexity and asking "well how long did THAT take", and using it as the initial ballpark number.
Then asking "well - why did it take so long?", and those points can become your risks you might want to try and control.  Maybe some of those things aren't going to happen here, so you can factor down on that assumption a little.
But like all disciplines, we can be way too optimistic estimating.  Also what we're typically asked is "how long to test", but what people really mean is "how long to fix".  Well ... a lot of that's outside the hands of testers.  Hence using experience really helps justify vs "this is a number I just plucked out the air ... then multiplied by 3".

September 13, 2016 - 8:15pm
John Wilson's picture

I wonder if breaking down the estimate may help develop the accuracy of the elements, or at least highlight the elements where the estimates are off by a lot. Examples o elements could be something like:- design tests, write tests, execute tests, wait for build, defect analysis, defect writing, wait for defect fix, retest defect.

September 14, 2016 - 4:32am
Tim Thompson's picture

Great article, but it is entirely based on the fact that someone bothers to ask how long testing takes. I worked on various developer led teams and testing is nothing more than a line item to check off at some point before release date, no matter if it happened or not. Asking for estimates? Yea, there was a time when that happened, that was before cramming in more features became the preference over delivering quality. These days there are no requirements, what needs to get built is designed on the fly, often kludged together based on the first idea that might work, and when release date comes along stuff gets shipped no matter what QA objects to.

Before QA estimates matter we need to see that QA matters again.

September 17, 2016 - 7:42am
Priten Shah's picture

very well written content exploring why it is very difficult to decide the software testing time. As per my personal experiance there are many obstacles in it like different resources, surprise erros at run time and many more. It also depends on which type of software testing solutions and tools one is going to utilize to derived the results.

I found your article very useful one. Keep up the good work in future too and share this kind of quality content more.....

September 29, 2016 - 5:53am
Ahmed ali's picture

Sir  is it possible that we can check if email already exist in input field and if already exist we can change it by loop without going or access from database ???

October 6, 2016 - 8:46am
Moritz Beller's picture

This article is very interesting, we did many of the things you suggest in a scientific study with more than 400 developers. We assessed, for example, testing time, testing run time and execution frequency, and compared it to other variables like programming experience. If you are interested, the paper will be available for a limited amount of time for free: http://inventitech.com/publications/2015_beller_gousios_panichella_zaidm...

Moreover, we have implemented the techniques we used as actual tools anyone can use off the shelf, in the opensource tool WatchDog (https://testroots.org/) for IntelliJ and Eclipse. WatchDog gives you nice reports, which allow you to conveniently access your testing analytics and see how well your estimations on testing were. 

We have not yet looked into the relationship with requirements, but it does sound like a fruitful future avenue.

October 19, 2016 - 6:11pm

About the author

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.