Big data isn’t just a buzzword; it lives in your software. With millions of possibilities to leverage analytics, how do you pick what’s right for your organization? Robert Cross provides some insight into how to start incorporating data analytics into your software process and management plan.
As a provider of independent software quality and security assessments, my company has the unique privilege of examining in great detail our client’s source code for all different types of risks. Many of our new relationships start off with our clients finding themselves in a “code-red” situation and in need of perspective. This might make some of you giggle and others cry, but we’ve all been part of projects forced to release software in less time than process said was possible (i.e. code bending).
The first face-to-face meeting we hold with our clients is usually the same across our accounts. The testing, development, and management teams look very tired. If they still allowed smoking in office buildings, everyone would be lighting up in the conference room. An argument typically ensues amongst the groups about the root cause of the problem and ends with the executive decision maker looking at us and saying, “Help us figure it out and throw everything you’ve got at it!” This can be translated to mean “Measure anything and everything that has a heartbeat of a chance to getting us out of this ditch, and, by the way, we needed it yesterday.”
We then ask if the teams are subject to any particular standards that are driven internally, by the industry or customer. In cases where there are standards in place we have found the engineers know about them but did not have time for due diligence because of schedule compression. It’s rarely the case of the team not knowing “how” or “what,” rather it’s the “when” that is the main culprit of chaos.
The process for analyzing software that your team has not authored is tedious and measures thousands of “things,” including quality risks (null pointers), defensive programming risks (exception handling), security risks (buffer overflows), and metrics (cyclomatic complexity) to name a few broad categories. This type of process incorporates numerous technologies to generate a large software risk-data profile on their system. From this, experts analyze and distill the data down to true positive findings and document them in various report formats tailored to the audience to explain their system’s specific and unique risk signature.
The key is balancing the importance of various data sets on what’s relevant and what’s nice to have, which adheres to the “count what counts” principle. I specifically remember one particular customer in which one of the senior executives proclaimed with absolute certainty prior to our analysis of his system, We have no default-less switches in our code, no way, no how! This in our standards and our engineers follow strict guidance!” Well, wouldn’t you know that we identified that his system had over 16,000 of them. We were all shocked that day because of the certainty of this executive’s statement. That meeting was a tough meeting to sit through, because the manager’s entire belief system was based on the assumption that his team members were following and not deviating from their defined software process.
However, there was no mechanism in place for anyone on the team, including management, to count what counts. Is measuring the number of default-less switches important? If it’s important to your program and to you then it should be counted. Find a way to collect the data, correlate it to specific risk factors, and collaborate these results across your teams to establish expectations and communicate the importance of this measurement. There are thousands of opinions out there on what should and shouldn’t be counted.
After analyzing billions of lines of code as an independent, my opinion is to start small and count a couple of key metrics. More important is to learn how these metrics correlate to business risks relevant to both engineers and executives; this allows you to create a common language framework so that when brought together, the consumers of this information aren’t talking past one another. Once your organization gets its arms around a small set, then expand it to incorporate other metrics again collecting data, correlating it to risk and collaborating it from top to bottom in your organization.
Why is the “count what counts” notion so important? Here is the truth. Schedule compression happens every day to all of us no matter what process you’re following. Either the customer, the market, upper management or the competition causes us to make decisions to take on exceptional risk. The art of bending space, time, and code will never go away, it’s the blessing and curse of software. However, unlike other business disciplines that have systems and measures in place so when they shortcut process there is an audit trail of data proactively alerting them to the consequences of this decision. They have transparency into their technical debt, whereas software is still catching up and has a ways to go.
One of the joys in being in this aspect of the business is enabling teams to have that “Ah Ha!” moment by realizing the power of data analytics when done properly. Having an opportunity to unify a sometimes-fractured team by focusing the conversation on data and not opinions is truly an amazing and fun experience. Of course hindsight is always twenty/twenty, but if it leads to customers changing their strategy to focus on important data rather than just tools or buying the newest widget, then it’s a big movement in changing our industry from being reactive to proactive.