Setting: Our tester, Tim, is verifying load performance of a server. He has
been waiting for his chance to use the server to run his tests. While he’s
waiting for the developers to finish, he realizes that if the server dies, he
can’t verify the load performance of the application. Tim makes a beeline for
Pam the project manager’s office.
Tim: "Hey, did you know this server is critical to our ability to load
test?"
Pam: "Hmm, no, I didn’t realize that." (Pam goes back to
reviewing the schedule.)
Tim: "Well, I want to get another one, okay?"
Pam: "What?! No, you can’t have another server. If you get another
server, other people will want more servers, and then our budget will be
shot."
Tim: "But if we don’t have the server at all, I won’t be able to
test."
Pam: "Hmm, then our bug counts will go down. That’s not bad."
(Tim glares at Pam.)
Pam: "Okay, then it’s your job to tell me how likely the equipment is
to break and how much it will cost to fix."
Ever had a conversation like this with a project manager? I hope not. But if
you had, you probably walked away furious and disgusted. You knew that the
project manager really didn’t care what your answer was. However, you know
that you somehow have to bring this information to the project manager’s
attention, so that she can take a more responsible approach to managing the
potential issue.
Potential issues are risks. Formal risk analysis is what happens when you
consider the likelihood that a potential issue will occur, and take into account
the severity of it happening, giving you the exposure. Then you create a
mitigation plan to deal with the problem. Testing is one form of risk
mitigation, by looking for defects before the customers find them. But that’s
not the only form of risk mitigation you’re likely to need.
Sometimes mapping out the risk can be helpful. I use a table like this one to
explain risks:
|
Risk |
Probability of occurrence |
Risk severity |
Exposure |
Trigger date |
Mitigation plan |
|
Define this in words |
How likely is this risk to occur? Use high,
medium, low |
How severe a problem is this risk if it occurs? Use
high, medium,
low |
Multiply probability and severity together, to derive a joint value |
The date by which you will set the mitigation plan in place |
What are you going to do about this risk? |
|
Load server may not be available in time to test |
High (it was in use by other groups for the last release) |
High (we can’t test performance under load without the
server) |
(High, High) |
2/1 |
Buy a new server, install it by 2/15, up and running by 3/1, in time
to start load testing |
First, I define the risk in words people can understand. Here, we’re
talking about a particular server's availability. Then, I define the probability
that this problem could occur. If you have historical information, use it. If
you don’t have any previous knowledge, then guess. In my example, we know that
in the previous release, other groups also needed to use the load server.
Then, define the severity of the risk. How bad a problem is this, if it
occurs? In this case, the potential problem is very bad, assuming we need to
test the product under load. Once you’ve defined the probability and severity,
you can multiply them together. Some people use numbers to quantify risk, so
they can easily multiply and get a number. I find that having managers see (High,
High) in bright red is enough information. In my experience, managers or
other people with organizational power manipulate the numbers to give the answer
other managers or other powerful people want to see. It’s much harder to
manipulate the highs, mediums, and lows.
For all risks, define the date by which you need a mitigation plan, a plan to
manage the risk if it does come true. For High exposure risks, define the
mitigation plan. (In your organization, you may also need a plan for medium
exposure risks. Some organizations require plans for even lower exposure risks.
It depends on the risk tolerance of your organization.)
Now, when you go to your project manager to explain that there’s a
potential problem with testing, it’s easier for the project manager to see the
potential problems and how they impact the whole project.
Of course, risk management doesn’t give you a magic wand and a crystal
ball. Rather, risk management is about looking at likely scenarios (and
even at some unlikely scenarios) and taking some action to reduce their
potential effects.
Why do risk analysis?
You do risk analysis for only one reason: Would you manage the project
differently if any of your risks happened? I especially look for risks that
could put us out of business, or prevent us from shipping product.
When I work with people on generating risk scenarios in the project, I ask
them to look at risks in these areas:
- Risks to getting the project completed. (The machine availability problem
above is a great example of that risk.)
- Risks to using the product in the field. (I find use cases or other forms
of customer-scenario generation work well here.)
- Risks to the business from using the product in the field. (If your
customers find this problem, could their reaction impair your ability to
do business?)
Then you make plans to deal with the risks you never want to come true. In
Tim’s project, for example, the lack of the machine when it was needed would
prevent them from doing load testing. How bad a problem is that, really? Would
Tim’s company have shipped anyway? If so, then there was no risk. (I’m not
saying this is good business, but if Tim’s company already decided that the
potential down side, the severity, is not high enough, then there is only
limited business risk if a server is not available and the testing is not done.)
Risk analysis can’t be exact. If it were exact, you’d be predicting the
future (and by the way, if you figure out how to predict the future, please do
let me know). But having a place to start the discussion about what the problem
is, and how it affects the project, is much better than the frustrating and
inadequate dialog we saw at the beginning of this column.