Manage the Risks and the Process

article

May 10, 2002

Summary

In the first articles in this series, I argued that speed doesn't necessarily sacrifice quality. Software can be developed faster and better, with the help of efficient testing and quality assurance activities. I listed the following ways of reducing the test cycle time and speeding the overall delivery process:

managing testing like a "real" project
strengthening the test resources
improving system testability
getting off to a quick start
streamlining the testing process
anticipating and managing the risks
actively and aggressively managing the process

The first two articles of this series discussed points 1-5. This article will finish the list with points 6 and 7.

6. Anticipate and Manage the Risks

Organize the risk management process. In my observation, the risk management skills of many software engineers, test and QA professionals, and even project leaders are seriously underdeveloped.

When you say to people: "Manage the risks," their answer is "We are already doing that," "That's obvious," or "The risks can't be managed any better." Sometimes you feel like you're explaining the risks of climbing Mt. Everest to a Sunday jogger.

The following checklist can be useful, as a series of reminders of the risks to watch out for. If some of these points are likely to apply to your project, it is a good idea to identify them early and see what can be done to minimize them. (For completeness, some of the points on this list repeat points which I have made elsewhere.) Also as a friendly warning, reading the following list may be depressing when you realize how many points apply to your situation.

The common causes of test project slippage, or inadequate testing within the time allocated, are

People Causes
Under-staffing the test team. (There are many reasons for this, and they may be difficult to overcome.)

Contention for scarce people resources.

Adding additional people to the test team, but too late to help-usually after the first third of the project (Brook's law).

Lack of sufficient experience in the test team, either in the functionality being tested, test methodology and tools, etc.

Lack of expertise in specialized aspects of testing, such as security controls testing, reliability testing, usability testing, etc.

Lack of sufficient user involvement and cooperation in the testing.

Lack of sufficient developer involvement and cooperation in the testing.

Test team learning curves that are longer than anticipated.

Tester turnover.

Fragmentation-assignment of people to too many projects in parallel, leading to time juggling.

Lack of access to important information, or miscommunication.

Failure to coordinate, or conflict with other groups, specifically the system developers, system maintainers, users, or marketers.

Lack of sufficient allowance for the overhead of organizing the work, monitoring and reporting the status of the test project.

Unreasonable deadline pressures, which lead to demoralized testers and burn-out.

System or Product Causes
The system version delivered to testing is too raw, buggy, and unstable to
test effectively.

Scope creep in the product.

Volatility-frequent changes to the system functionality.

Test Process Causes
Unfamiliarity of the testers with the test process to be used for the system they are testing.

Revising the testing objectives during the test project.

Scope creep in the testing project, e.g., expansion of the types of testing to be undertaken.

False test results.

Test cases that provide untrustworthy information.

Lack of reusable test plans, cases, and procedures.

Slow, cumbersome, and inefficient test procedures (e.g., spending a lot of time looking for prior test cases for reuse).

Ineffective or poor-quality bug fixes, which may introduce new bugs or fail to fully resolve a problem, leading to extra debugging followed by extra testing.

Failure to adequately monitor, control, and document the testing activities.

Unstructured or exploratory testing that is too disorganized and thus confuses and delays, and does not illuminate the situation.

Unhelpful problem reports.

Ineffectual bug advocacy.

Development Process Causes

Low priority given to debugging and fixing-leads to long defect aging (bug fix turnaround time).

High rate of insertion of new defects with fixes.

Fixes that do not resolve the problems, requiring cycles of refixing.

Test Environment Causes
Inadequate tools or equipment available for the testing.

Delays in obtaining the needed testware, e.g., test tools and facilities.

Gremlins in the testware.

Difficulty in using the testing tools, either because of limitations in the tools themselves or in the skill levels available to utilize these tools.

Underestimation of the effort needed to climb the learning curve with new testing tools, facilities, and procedures.

Difficulty in test automation.

Corrupted, unrepresentative, or untrustworthy test databases.

Project Management Causes
Lower priority routinely given to testing activities, versus development or other project activities.

Lack of early tester involvement in the system development project, so there is less time to prepare.

Lack of clear, agreed-on system acceptance criteria (and thus test completion criteria).

Taking testers away from critical tasks, such as running test cases, for
noncritical tasks, such as required attendance at meetings on unrelated topics.

Vague, general test plans.

Out-of-date test plans.

Unrealistic estimates.

Lower priority routinely given to the testers in the competition for scarce resources.

Beginning the testing prematurely, before the test entry criteria have been met, leading to test rework.

Unwillingness of the senior managers to make timely decisions on which the test team is waiting.

Lack of a code freeze. Undisciplined, last minute additions or changes of features, which may invalidate the performance measurements.

Lack of reliable test project status information-for monitoring the testing project, tracking progress versus plan.

Significant underestimating of the number of cycles of performance measurement, evaluation, and tuning before the system is ready to go live.

Lack of contingency plans for events that do not happen as expected.

Unplanned waits for other groups to do things on which the testers are dependent.

Risks are situation-specific, and this certainly is not a complete list. What other nontrivial risks would you add to this list?

7. Actively and Aggressively Manage the Process

Be decisive, and ready to react quickly as conditions change. Conditions always change as projects progress, and the responses to these changes need to be nimble, not ponderous. "Fast-smart" decision making is needed, rather than mindless adherence to a partly obsolete test plan, or endless meetings to decide what to do. Slow responses, even if correct, may be too late.

Develop a workable schedule with frequent milestones to use in tracking the testing project. We want to obtain early warning that things are going off track. The best way to identify delays and bottlenecks is to have a detailed and realistic test project plan, containing frequent interim milestones at which actual progress can be compared easily with the plan in order to identify deviations. The testing project needs to have "inchstones" (pebbles) as well as milestones.

Doing this requires strong project management skills and a lot of savvy about what it really takes to get a testing project done.

This project plan needs to be updated as conditions change, of course, to still be accurate and usable. This means the project plan should be easy to maintain, preferably with a project management software package.

Aggressively fight slippage. Sometimes people are complacent when their project slips a little, especially early in the project. They figure that they have lots of time. Only after an accumulation of several slips in the schedule do they begin to pay attention. The earliest slips, though, should be taken as early warning signs.

The test team needs to address promptly why the project is slipping, and resolve the problem. The idea is to monitor both the preventive QA efforts and the test efforts carefully to determine when they are falling behind, and to take quick corrective actions.

Of course, these comments assume that we know we are slipping in the first place. If the project objectives and scope are imprecise, or if there is not a reasonably detailed workplan with scheduled milestones, there is no way to detect the slippage.

Predict and track the likely causes of delay. Once a week on the testing project, take the time to look ahead and identify the likely causes of delay over the next two, three, and four weeks. Keep a close eye on these.

Aggressively manage defect aging. Set turnaround goals for the resolution of reported problems, based on levels of severity; e.g., showstoppers should be fixed and ready for retesting within twenty-four hours. Monitor the level of compliance with these goals.

Have an early warning process to identify test bottlenecks and resolve them quickly. Vigorously monitor defect aging, as mentioned earlier: this is a major cause of temporary blockages in testing.

Testing is often delayed for the most mundane of reasons, which attention to detail and smart test project management can avoid. Seemingly trivial and senseless reasons for delays include not having the test environment ready on time, having testers who have never formally been trained to use the automated test tools they are using, and "tiny" last-minute improvements by developers who cannot leave well enough alone. The list could go on and on.

Measure how time is being used. The first step in the management of an individual's time or a team's time is to understand where the hours are being spent. For example, a Bell Labs study found that software engineers spent only thirteen percent of their days on actual programming tasks. The remainder of the time was frittered away on other activities, such as interruptions, unscheduled meetings, phone calls, and tasks unrelated to meeting the deadline.

Be proactive, not passive. Don't procrastinate and "go with the flow" before you raise concerns. Often, the politics of projects cause the testers to be fairly low in the pecking order, and it is easy for the testers to be passive and take their direction from developers or others, rather than taking the initiative themselves. While there is a danger of going too far when testers may not be aware of certain business reasons for a delay, a strong and independent test team can speed the delivery process.

Don't allow interpersonal conflict to get in the way. There is often "bad blood" between the developers and the testers, or at least an unwillingness to cooperate, an inability to see each other's point of view, or miscommunication. Ultimately, we are all in the same boat-we win or lose together. Working effectively with developers, users, and managers does not mean that a tester has to compromise quality principles.

Don't allow a crisis to derail the project. It is an unusual project that does not have at least one crisis or major hassle sooner or later. It is how we handle the crisis that counts. Crises are miserable-there may be finger pointing and recriminations, high stress, screaming, and exhausting long hours until the crisis is resolved.

With motivation, teamwork, and trust, the team will get through the crisis. Without these, everything can fall apart. Treating people fairly and building a strong team spirit make the difference when people are called on and asked for an extraordinary effort to resolve the crisis.

Pray for good fortune. We need all the good fortune we can get.

When Haste Can Make Waste
Despite the good intentions behind these ideas on how to speed the testing and development processes, unless we are careful they can be counterproductive. Just ask anybody who has installed the latest hot software product or product upgrade and lived to regret it.

In the words of James Bach: "Finding the important problems fast is about as axiomatic as it gets in this business." James is correct. The whole idea of risk prioritization is this: since we can't test everything, we have to perform triage. In this triage, the perceived risks and vulnerabilities of the system are used to focus the depth and intensity of the testing in the areas where the payoff in finding bugs is likely to happen most quickly.

Nevertheless, I have been around testing projects that were run with a "get the biggest bugs fast" philosophy, but that were ultimately less than successful.

The basic issue here is the short-term versus the long-term perspective of what constitutes success, like the tortoise and the hare. Imagine, for example, that a general before going into battle says to the assembled troops: "Find and kill the king first and then the most important enemy generals as soon as possible. And eliminate their most threatening warriors and leaders before we bother with the remainder who are the unmotivated followers."

It's a good strategy. If we want to win the battle, we would agree. Though the country where the battle occurs may be in ruins for decades after, and the victory could contribute to a new war against the resentful survivors in twenty years, that's totally irrelevant in the short-term view on the day of the battle. If we look at bug eradication as war, and many argue this is the only legitimate view, then this quick-kill approach is the one that we need to take.

Imagine another situation, a software development project where the team leader says: "Getting the most important software modules done as fast as possible is the most important thing in this project." We might choose as quality professionals to disagree with this statement, for several valid reasons.

On a software development project, these reasons can include concern that (a) the system design is not very good in the rush to code (for example, the opportunities to build in defect detection and recovery processes have been overlooked), (b) the apparently easy, minor, and simple software components might receive less care in their construction, (c) the system documentation may be a skimpy afterthought, (d) the system maintainability and potential to reuse components on other future projects may not be very good, and (e) items of importance to the user, such as the usability of the system, might be compromised in the rush to get the most important software modules done as fast as possible.

In testing projects, the attitude of "let's get the biggest bugs first and fastest" might lead to these side effects:

There is little time to plan-we have to get going and find those bugs! In the long run, this means the testing may be more haphazard and possibly less reliable.
There is evidence that with intuitive testing methods, the test coverage is likely to be unknown and overestimated: the actual coverage is much lower than what the testers think it is. See the discussion of the dismal results with intuitive testing in the famous triangle experiment.
There is little time during the testing process to document test cases and procedures, so that the future reuse of these test cases and procedures is compromised.
The test log (which is the audit trail of specifically what happened in the execution of each test case), is filled out in a skimpy manner in the rush to find and fix the bugs.
In the rush to find the most important bugs, the juniors on the project are more likely to be assigned "grunt work" rather than difficult assignments that can stretch and grow them, because the most senior, experienced testers are doing the core work and the most difficult testing.
Practices like testers' walkthroughs of each other's test cases are discouraged, because they are perceived as getting in the way of the "real work" of running test cases.
The reuse of testware, such as test case libraries, is likely to be lower if it has been designed for one-time use and with little attention paid to its future use.
In the relentless march on to the next testing project after this one is finished, little or no time is allocated for debriefing and reflection on the lessons learned and how the testing process can be improved.

In other words, it is possible to win the battle and lose the war. On the other hand, if we do not win the battle, we may not be around for the rest of the war.

Imagine a software project manager who says to the software engineers on a project: "Our job is to deliver a trustworthy system that satisfies the user. The feature set should work and be complete, and the users' critical success factors (CSF) need to be met. Getting the most important software modules coded fast is a strategy worth considering in building the system, but it is not necessarily the best way to achieve user satisfaction in the long term." (The guy would probably be lynched and never get the chance to show
how the alternatives to super-fast code delivery could pay off.)

Or a test team leader who says: "Naturally, it is critical in testing to find the most important problems fast. But that is not the only criterion for the success of our test project. Other success factors include developing better insights into the nature and sources of bugs, building effective long-term working relationships with the system developers and others, developing the junior testers of today who in three years will become members of our inner core group of scarce, highly knowledgeable testers, leaving behind reusable test case libraries, and so on."

What's the point here? That we should not try to find the most critical bugs as early as possible? No, that's not the point. We still need to find the important bugs fast, but we defeat ourselves if we neglect larger strategies that will produce greater efficiency in the long run.

In Summary
There are many things testers can do to speed the system delivery without sacrificing quality. I have presented a few ideas which I have tried myself and seen work, or seen work in the hands of others. The ideas are generally practical and implementable. The question is, how can you apply these ideas in your situation? The potential rewards certainly make the question worth sincere consideration.

Read Part 1: "Manage and Strengthen Testing"
Read Part 2: "Conduct Early and Streamlined Testing"

Topics:

project management teams risk management process improvement lifecycle models

About The Author

Ross Collard

Ross Collard is a consultant who currently is working on software testing and quality assurance projects for AT&T, Cisco, GE, Lucent, and the State of California. He teaches software testing for UC Berkeley. Ross has an MS in computer science from the California Institute of Technology and an MBA from Stanford.