More Than a Score: Taking a Deeper Dive into Your Metrics

[article]
Summary:
One key benefit of metrics is that they can be measured using a standard process; we can explain the numbers, and leadership can understand what that means. The downside is that it is only a measurement, so issues can easily hide until they become problems, and great work can also go unrepresented. Sporting events are a great example: The end score tells you who won, but not the details of the game. We need to look deeper.

In data analysis, one of our key values is providing an accurate measure of performance metrics based on a set of criteria that is considered to be an appropriate yardstick for the context. These measures have many labels, including key performance indicator (KPI), balanced scorecard, traffic light, metrics, glass table, and dashboard. I think most people in software likely have a mix of anticipation and dread around them!

All these terms are just a means for different layers in our operation to have a common way of evaluating performance. If a number gets better, doughnuts for everyone, and we are celebrated for our skills and expertise. Should the numbers get worse, people play the blame game to find out who is responsible and what corrective action must be taken. However, neither extreme is correct.

We should be using metrics to show our performance against the standard and then see where changes are needed or where to take a deeper dive.

A metric is just a measure of something at a moment of time. Like the weather, it changes due to any number of factors, not all of which can be known or controlled.

One key benefit of metrics is that they can be measured using a standard process; we can explain the numbers, and leadership can understand what that means. The downside is that it is only a measurement, so issues can easily hide until they become problems, and great work can also go unrepresented.

Sporting events are a great example: The end score tells you who won, but not the details of the game. How many times have you seen a game dominated by one team—and clearly the better team—when at the last minute, something happens that allows the other team to win? If we just look at the final score, we get a measure but lose the relevant details of the game. To really understand the full game behind the score, we need to look deeper.

When I am generating the metrics, I am also looking for trends and patterns hidden in the data. In most processes, there is a certain amount of variation (or noise) that is just considered normal. What we should be more concerned with are events that are not normal, or a trend telling of a coming problem. 

In manufacturing, you see this kind of thing with an X-bar or control chart. This tool is a way to consider the normal noise in a process and look for the patterns. Ideally, you can filter out the noise, but that is frequently not possible.

In the example below, the blue line has normal noise and moves above and below the midpoint (target) on a random basis. The red line has patterns where it trends up or down for a number of periods in a row until a correction occurs to “reset” it. The earlier we can detect the need to make the adjustment, the closer we can stay to the target.

Sample X-bar chart

In most business operations metrics I generate, this kind of X-bar chart isn’t done. Instead, we are looking for two things in our metrics:

  • Are the systems being monitored in control and being maintained adequately?
  • Are there trends in our data that show us going in the wrong direction—increasing levels of risk, more frequent outages, or other risks that show a need to act?

By having a view of what is “normal,” we can look for issues or events that are unexpected for further actions.

I collect data on about 350 systems and hardware in our IT network, and I look for a number of different elements to better understand our situation.

Changes by application:

  • Number of normal change tickets per month: IT systems need regular maintenance to remain safe and relevant in the market. You can look over a period of time to see what a “normal” number of changes is, and when an increase (or decrease) happens, you have the opportunity to see if something has changed.

  • Number of “emergency” changes: There are times when a change cannot wait for the normal change cycle and has to be made right away. By looking at this, you can see how stable an application is and how well it is being managed. A system with a lot of emergencies may have issues in how it is being managed or other problems that need to be dealt with. As the keeper of metrics, you are in a great place to see the big picture and alert management to the risk.

  • Change tickets without a corresponding incident: Ideally, when a user has an issue, they log the incident, and when a change is deemed needed, a related change ticket is created. If you see a lot of change tickets without related incidents, this can be a signal that users are going directly to development for support and you are lacking visibility into issues being experienced by the customers.

Some issues from users have to do with training or access questions and do not require a change, and others do result in changes. You can learn a lot about systems by looking at the incident tickets. (Your company may call them problem tickets, so be sure you understand your issue-tracking system).

Incidents by application:

  • Tickets opened per month: This metric gives you a fast way to see which systems are causing the most issues with your customers.

  • Tickets closed per month: This metric allows you to understand how your support team is handling the load.

  • Rate of open to closed tickets per month: Ideally, the open and closed values should be about the same, but when a new application comes online, it is typical to see the number of tickets open exceed the number closed. Keeping an eye on the backlog is a key way to see how the application and the support team are doing.

  • Ticket aging: Some tickets take longer to solve than others. We have all had a user report an issue and then go on vacation, so it can’t be resolved. But keeping track of the average days a ticket is open is another great way to monitor your system’s health.

  • Tickets by support person: This can be a little more controversial, but your team likely has some support employees who are faster on tickets and others who are less. You also likely have senior team members who get the harder tickets that take longer to solve. Looking at the data this way can be informative, but remember that it should not be the only measure of team member quality or efficiency.

Each of these data points can identify a different way to better understand your customers and the issues they may have. Like the final score of a sports game, the initial numbers do not always tell the full story. More detailed data perspectives can provide a fuller picture and lead to improved operations.

The data elements that you have access to will not be the same as the data I have, but the concepts are the same: A better understanding of the situation leads to happier customers.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.