Software engineering and software project management are complex activities. Both software development and software management have dozens of methodologies and scores of tools available that are beneficial. In addition, there are quite a few methods and practices that have been shown to be harmful, based on depositions and court documents in litigation for software project failures. In order to evaluate the effectiveness or harm of these numerous and disparate factors, we have developed a simple scoring method. The scoring method runs from +10 for maximum benefits to -10 for maximum harm.
Software development and software project management have dozens of methods, hundreds of tools, and scores of practices. Many of these are beneficial, but many are harmful too. There is a need to be able to evaluate and rank many different topics using a consistent scale.
To deal with this situation, a scoring method has been developed that allows disparate topics to be ranked using a common scale. Methods, practices, and results are scored using a scale that runs from +10 to -10 using the criteria shown in table 1.1.
Both the approximate impact on productivity and the approximate impact on quality are included. The scoring method can be applied to specific ranges such as 1,000 function points or 10,000 function points. It can also be applied to specific types of software such as information technology, web application, commercial software, military software, and several others.
The midpoint or “average” against which improvements are measured are traditional application development methods such as waterfall development performed by organizations that either don’t use the Software Engineering Institute’s capability maturity model or are at level 1. Low-level programming languages are also assumed. This fairly primitive combination remains more or less the most widely used development method even in 2008.
One important topic needs to be understood. Quality needs to be improved faster and to a higher level than productivity in order for productivity to improve at all. The reason for this is that finding and fixing bugs is overall the most expensive activity in software development. Quality leads and productivity follows. Attempts to improve productivity without improving quality first are not effective.
For software engineering a serious historical problem has been that measurement practices are so poor that quantified results are scarce. There are many claims for tools, languages, and methodologies that assert each should be viewed as a “best practice.” But empirical data on their actual effectiveness in terms of quality or productivity has been scarce. Three points need to be considered.
The first point is that software applications vary in size by many orders of magnitude. Methods that might be ranked as “best practices” for small programs of 1,000 function points in size may not be equally effective for large systems of 100,000 function points in size.
The second point is that software engineering is not a “one size fits all” kind of occupation. There are many different forms of software such as embedded applications, commercial software packages, information technology projects, games, military applications, outsourced applications, open-source applications, and several others. These various kinds of software applications do not necessarily use the same languages, tools, or development methods.
The third point is that tools, languages, and methods are not equally effective or important for all activities. For example, a powerful programming language such as Objective C will obviously have beneficial effects on coding speed and code quality. But which programming language is used has no effect on requirements creep, user documentation, or project management. Therefore, the phrase “best practice” also has to identify which specific activities are improved. This is complicated because activities include development, deployment, and post-deployment maintenance and enhancements. Indeed, for large applications, development can take up to five years, installation can take up to one year, and usage can last as long as twenty-five years before the application is finally retired. Over the course of more than thirty years, there will be hundreds of activities.
The result of these various factors is that selecting a set of “best practices for software engineering” is a fairly complicated undertaking. Each method, tool, or language needs to be evaluated in terms of its effectiveness by size, by application type, and by activity.
Overall Rankings of Methods, Practices, and Sociological Factors
In order to be considered a “best practice,” a method or tool has to have some quantitative proof that it actually provides value in terms of quality improvement, productivity improvement, maintainability improvement, or some other tangible factors.
Looking at the situation from the other end, there are also methods, practices, and social issues that have demonstrated that they are harmful and should always be avoided. For the most part, the data on harmful factors comes from depositions and court documents in litigation.
In between the “good” and “bad” ends of this spectrum are practices that might be termed “neutral.” They are sometimes marginally helpful and sometimes not. But in neither case do they seem to have much impact.
Although the author’s book Software Engineering Best Practices dealt with methods and practices by size and by type, it might be of interest to show the complete range of factors ranked in descending order, with the ones having the widest and most convincing proof of usefulness at the top of the list. Table 2 lists a total of 200 methodologies, practices, and social issues that have an impact on software applications and projects.
The average scores shown in table 2 are actually based on the average of six separate evaluations:
1. Small applications < 1,000 function points
2. Medium applications between 1,000 and 10,000 function points
3. Large applications > 10,000 function points
4. Information technology and web applications
5. Commercial, systems, and embedded applications
6. Government and military applications
The data for the scoring comes from observations among about 150 Fortune 500 companies, some fifty smaller companies, and thirty government organizations. Negative scores also include data from fifteen lawsuits. The scoring method does not have high precision and the placement is somewhat subjective. However, the scoring method does have the advantage of showing the range of impact of a great many variable factors. This article is based on the author’s two recent books: Software Engineering Best Practices published by McGraw Hill in 2009 and The Economics of Software Quality published by Addison Wesley in 2011.
However, the resulting spreadsheet is quite large and complex, so only the overall average results are shown here:
It should be realized that table 2 is a work in progress. Also, the value of table 2 is not in the precision of the rankings, which are somewhat subjective, but in the ability of the simple scoring method to show the overall sweep of many disparate topics using a single scale.
Table 2 is often used as a quick evaluation method for software organizations and software projects. From interviews with project teams and software managers, the methods actually deployed are checked off on table 2. Then the numeric scores from table 2 are summed and averaged.
A leading company will deploy methods that, when summed, total to more than 250 and average more than 5.5. Lagging organizations and lagging projects will sum to less than 100 and average below 4.0. The worst average encountered so far was only 1.8 and that was done as background to a lawsuit for breach of contract. The vendor, who was also the defendant, was severely behind in the use of effective methods and practices.
Note that the set of factors included are a mixture. They include full development methods such as team software process (TSP) and partial methods such as quality function deployment (QFD). They include specific practices such as “inspections” of various kinds, and also social issues such as friction between stakeholders and developers. They also include metrics such as “lines of code,” which is ranked as a harmful factor because this metric penalizes high-level languages and distorts both quality and productivity data. What all these things have in common is that they either improve or degrade quality and productivity.
Since programming languages are also significant, it might be asked why specific languages such as Java, Ruby, or Objective C are not included. This is because as of 2011 more than 2,500 programming languages exist, and new languages are being created at a rate of about one every calendar month.
In addition, a majority of large software applications utilize several languages at the same time, such as JAVA and HTML, or combinations that may top a dozen languages in the same applications. There are too many languages and they change far too rapidly for an evaluation to be useful for more than a few months of time. Therefore, languages are covered only in a general way: are they high-level or low-level, and are they current languages or “dead” languages no longer in use for new development.
Unfortunately, a single list of values averaged over three different size ranges and multiple types of applications does not illustrate the complexity of best-practice analysis. Table 3 shows examples of thirty best practices for small applications of 1,000 function points and for large systems of 10,000 function points. As can be seen, the two lists have very different patterns of best practices.
The flexibility of the agile methods is a good match for small applications, while the rigor of TSP and PSP is a good match for the difficulties of large-system development.
Table 3: Best Practice Differences between 1,000 and 10,000 Function Points
It is useful to discuss polar opposites such as best practices and worst practices which are on opposite ends of the spectrum.
The definition of a “worst practice” is a method or approach that has been proven to cause harm to a significant number of projects that used it. The word “harm” means either degradation of quality, reduction of productivity, or concealing the true status of projects. In addition, “harm” also includes data that is so inaccurate that it leads to false conclusions about economic value.
Each of the harmful methods and approaches individually has been proven to cause harm in a significant number of applications that used them. This is not to say that they always fail. Sometimes, albeit rarely, they may even be useful. But in a majority of situations, they do more harm than good in repeated trials.
What is a distressing aspect of the software industry is that bad practices seldom occur in isolation. From looking at the depositions and court documents of lawsuits for projects that were cancelled or never operated effectively, it usually happens that multiple worst practices are used concurrently.
From data and observations on the usage patterns of software methods and practices, it is distressing to note that practices in the harmful or worst set are actually found on about 65 percent of U.S. software projects as noted when doing assessments. Conversely, best practices that score 9 or higher have only been noted on about 14 percent of U.S. software projects. It is no wonder that failures far outnumber successes for large software applications!
From working as an expert witness in a number of breach-of-contract lawsuits, I have observed that many harmful practices tend to occur repeatedly. These collectively are viewed by the author as candidates for being deemed “professional malpractice.” The definition of professional malpractice is something that causes harm that a trained practitioner should know is harmful and, therefore, shoukd avoid using it.
Following are thirty issues that have caused trouble so often that the author views them as professional malpractice, primarily if they occur for applications in the 10,000 function point size range. That is the range where failures outnumber successes and where litigation is distressingly common. Only one of fifteen lawsuits where the author worked as an expert witness was smaller than 10,000 function points.
Table 4: Candidates for Classification as “Professional Malpractice”
It is unfortunate that several of these harmful practices, such as “cost per defect” and “lines of code” are still used for hundreds of projects without the users even knowing that “cost per defect” penalizes quality and “lines of code” penalizes high-level languages.
Collectively, many or most of these thirty harmful practices are noted in more than 75 percent of software applications =>10,000 function points in size. Below 1,000 function points, the significance of many of these decline and they would drop out of the malpractice range.
Summary and Conclusions
The phrase “software engineering” is actually a misnomer. Software development is not a recognized engineering field. Worse, large software applications fail and run late more often than they succeed.
There are countless claims of tools and methods that are advertised as improving software, but a severe shortage of empirical data on things that really work. There is also a shortage of empirical data on things that cause harm.
The simple scoring method used in this article attempts to provide at least a rough correlation between methods and practices and their effectiveness, quality, and productivity. The current results are somewhat subjective and may change as new data becomes available. However, the scoring method does illustrate a wide range of results from extremely valuable to extremely harmful.
References and Suggested Readings
Bundschuh, Manfred and Deggers, Carol; The IT Measurement Compendium; Springer-Verlag, Heidelberg, Deutschland; ISBN 978-3-540-68187-8; 2008.
Charette, Bob; Software Engineering Risk Analysis and Management; McGraw Hill, New York, NY; 1989.
Ewusi-Mensah, Kweku; Software Development Failures; MIT Press, Cambridge, MA; 2003; ISBN 0-26205072-2276 pages.
Galorath, Dan; Software Sizing, Estimating, and Risk Management: When Performance is Measured Performance Improves; Auerbach Publishing, Philadelphia; 2006; ISBN 10: 0849335930; 576 pages.
Garmus, David and Herron, David; Function Point Analysis – Measurement Practices for Successful Software Projects; Addison Wesley Longman, Boston, MA; 2001; ISBN 0-201-69944-3;363 pages.
Gilb, Tom and Graham, Dorothy; Software Inspections; Addison Wesley, Reading, MA; 1993; ISBN 10: 0201631814.
Glass, R.L.; Software Runaways: Lessons Learned from Massive Software Project Failures; Prentice Hall, Englewood Cliffs; 1998.
International Function Point Users Group (IFPUG); IT Measurement – Practical Advice from the Experts; Addison Wesley Longman, Boston, MA; 2002; ISBN 0-201-74158-X; 759 pages.
Johnson, James et al; The Chaos Report; The Standish Group, West Yarmouth, MA; 2000.
Jones, Capers and Bonsignour, Olivier; The Economics of Software Quality; Addison Wesley, Boston, MA; 2012; ISBN 10: 0-13-258220-1; 587 pages.
Jones, Capers; Best Practices in Software Engineering; McGraw Hill, New York NY; 2010; ISBN 978-0-07-162161-8 660 pages.
Jones, Capers; Applied Software Measurement; McGraw Hill, 3rd edition 2008; ISBN 978-0-07-150244-3; 575 pages; 3rd edition due in the Spring of 2008.
Jones, Capers; Assessment and Control of Software Risks; Prentice Hall, 1994; ISBN 0-13-741406-4; 711 pages.
Jones, Capers; Software Quality – Analysis and Guidelines for Success; International Thomson Computer Press, Boston, MA; ISBN 1-85032-876-6; 1997; 492 pages.
Jones, Capers; Estimating Software Costs; McGraw Hill, New York; 2007; ISBN 13-978-0-07-148300-1.
Jones, Capers; Software Assessments, Benchmarks, and Best Practices; Addison Wesley Longman, Boston, MA; ISBN 0-201-48542-7; 2000; 657 pages.
Jones, Capers: “Sizing Up Software;” Scientific American Magazine, Volume 279, No. 6, December 1998; pages 104-111.
Jones, Capers; Conflict and Litigation Between Software Clients and Developers; Software Productivity Research, Inc.; Narragansett, RI; 2008; 45 pages.
Jones, Capers; “Preventing Software Failure: Problems Noted in Breach of Contract Litigation”; Capers Jones & Associates, Narragansett, RI; 2008; 25 pages.
Kan, Stephen H.; Metrics and Models in Software Quality Engineering, 2nd edition; Addison Wesley Longman, Boston, MA; ISBN 0-201-72915-6; 2003; 528 pages.
McConnell; Software Estimating: Demystifying the Black Art; Microsoft Press, Redmund, WA; 2006.
McConnell, Code Complete; Microsoft Press, Redmond, WA; 1993; ISBN 13-978-1556154843; 886 pages.
Pressman, Roger; Software Engineering – A Practitioner’s Approach; McGraw Hill, NY; 6th edition, 2005; ISBN 0-07-285318-2.
Radice, Ronald A.; High Qualitiy Low Cost Software Inspections; Paradoxicon Publishingl Andover, MA; ISBN 0-9645913-1-6; 2002; 479 pages.
Wiegers, Karl E.; Peer Reviews in Software – A Practical Guide; Addison Wesley Longman, Boston, MA; ISBN 0-201-73485-0; 2002; 232 pages.
Yourdon, Ed; Death March - The Complete Software Developer’s Guide to Surviving “Mission Impossible” Projects; Prentice Hall PTR, Upper Saddle River, NJ; ISBN 0-13-748310-4; 1997; 218 pages.
Yourdon, Ed; Outsource: Competing in the Global Productivity Race; Prentice Hall PTR, Upper Saddle River, NJ; ISBN 0-13-147571-1; 2005; 251 pages.
International Software Benchmarking Standards Group (ISBSG): www.ISBSG.org
International Function Point Users Group (IFPUG): www.IFPUG.org
Process Fusion: www.process-fusion.net
Project Management Institute (www.PMI.org)
Software Engineering Institute (SEI): www.SEI.org
Software Productivity Research (SPR): www.SPR.com