Useful Metrics and the Problem with Performance Testing Programs: An Interview with Scott Barber

interview

March 21, 2014

Summary

Scott Barber is the chief performance evangelist for SmartBear and an author of several books, including Performance Testing Guidance for Web Applications. In this interview, Scott chats about useful test metrics, communication, and the problem with performance testing programs.

JV: I’m here with Scott Barber. You’re a familiar face around these parts. I’m glad to have you talking with us today. Can you explain your background to our listeners and readers?

SB: Sure. I’ve primarily been a performance tester for the last fifteen years. Although I say performance tester, I’ve been doing a lot of training, consulting, and speaking. I’ve managed to get a couple of books out there. That’s always a lot of fun. Actually, it’s just a lot of work. I started out, believe it or not, with a degree in civil engineering.

JV: I didn’t know that.

SB: Yeah, I paid for that with an Army ROTC scholarship, so then I was an Army officer for a while. I got recruited out to update one of the military systems that, needless to say, was a little behind the times.

JV: Yeah, I can imagine that.

SB: I bounced around a little bit before I ended up in the developing and testing space. It took a little bit of a winding path to get here, but I’ve enjoyed it since I got here.

JV: And you’ve got some upcoming sessions in STARCANADA, so I thought it would be good to talk about that. Why don’t we get into a topic about metrics? Explain to me how metrics can destroy your soul.

SB: Metrics are a really interesting topic psychologically, which is not what we normally think of when we’re talking about metrics, right?

JV: Right.

SB: You can think about it for yourself. Any job you’ve ever had or when you’re back in school, what’s the first thing you try to do when you’re taking a class? You try to figure out what the teacher grades on, what they want to get your A. As a student, that’s your goal in life, to do the least work possible and get an A.

Your goal is supposed to be to learn everything I’m supposed to learn. But that’s not what you do. You game the metric. It’s human nature. Some people take it to an extreme, don’t get me wrong, but it’s just human nature.

Now let’s take this into tester land. Let’s say your boss or the executive is interested in tracking bug counts.

JV: A quantifiable measurement, a number.

SB: That’s right. It’s a number. So, as a tester, you kind of end up finding out a sweet spot; if every week I report between twenty-five and thirty bugs, then the boss thinks I’m actually doing my job. But, I’m not reporting so many bugs that he thinks I’m gaming the system. So, every week I’m going to try to find twenty-five-to-thirty bugs. Which means this week, “Hey, you know what? I’ve got a bunch of great bugs on Monday. What am I doing the rest of the week? Not a whole lot.”

Most people are more ethical than that. I’m taking it to an extreme.

JV: Yeah, these are the coasters.

SB: Right. The problem is pretty much any metric that we’ve come up with as a singular measure, when it comes to testing and development, at best tells a tiny part of the story. And so if you get overly focused on any one, you end up with really unintended consequences. So when you figure that out, what’s the next thing you do? You build this metrics program with tons and tons of metrics.

And what’s the side effect of that? All anybody has time to do is collect metrics, so no work gets done. So, finding that weird balance between what metrics are useful and how do I collect enough without being so burdensome that everyone’s job becomes create metrics instead of developing and testing software, is really challenging.

JV: It’s sort of making data. You’re making data and adding data to work.

SB: Right. And what we need is not data. What we need is information. What the talk is about, aside from some cool stories, or maybe not cool but funny stories… a couple of rants about what’s gone wrong.

JV: We love rants.

SB: What we get into is some approaches to getting, doing a little better at sharing the information that folks reasonably need, without having so much of this struggle with metrics and human nature and psychology and all that craziness.

JV: What constitutes a good measurement? What is a useful metric?

SB: I would argue, and as a performance tester I will tell you that I spend most of my time dealing with measurements, with data points. A lot of data points. But the flip side is I don’t want to be rolled up consolidations and summaries. I want the raw data. That means I deal with a lot of it.

Here’s the thing: I go ahead and roll it up. Here’s what I teach people: I say, look, you have analyzed this. You know or have figured out what the interesting part of this data is. So, you are going to present the interesting part of this data, but don’t present it by itself. I tell everybody that what your goal ought to be is one picture and a couple of bullets or maybe up to a paragraph, because the data alone doesn’t tell the whole story.

For example, I could tell you that on some teams a common measurement is “defect removal rate.” Basically, what that means is a defect is found; how long does it take before it’s fixed? And over a big enough data set with a mature team, that can be a pretty decent measurement… right up until something crazy happens. Like, I don’t know, Bob’s wife has a baby and he takes two weeks off and a whole bunch of bugs are assigned to him.

Now, instead of him resolving them in his normal three hours, they’re just sitting there for two weeks. So your defect removal rate curve is all messed up, for a reason that has nothing to do with the metric.

JV: And that completely distorts the big picture. It’s sort of like bad data. It’s an outlier.

SB: Exactly. And if you tell the story behind the data, if you show the data and say this is truth because those bugs really still are sitting there open, then it’s true. But if what you’re measuring, if what you’re interested in long-term is the actual efficiency in removing defects, we probably should not count vacation time and time off. We should take that out of our curve.

So the same data, from week to week I can say, hey, I’ve got bugs that aren’t getting resolved so it’s good. In terms of what is my team’s historical efficiency, that’s going to skew your norm. So when you ask what is a good metric, my answer is a metric that comes with a story that actually leads stakeholders to be able to take the right action.

JV: It’s what stakeholders are really looking for when they request specific metrics. This is what you are referring to.

SB: Right. Too many times managers come and say I need these three metrics. And we provide them because hey, they’re our boss. Other times a manager will come and say I need metrics. And so we give them stuff that makes sense to us.

But I think the right approach to finding the right metric for your team right now is to instead start with the question what is we’re trying to learn? What is it we’re trying to manage? Let’s work together and find the right way to provide the information. Maybe it’s metrics; maybe it’s something else.

But instead of starting from “Give me metric X for my buddy Joe and this other company that works for him,” how about we start with “What do you want to know?” And let’s engineer a way for you to get that information. I think that’s the way to build a good program.

JV: For me it’s about articulation and proper communication between both parties. That’s what it seems like to me. It seems less of a “here is my report that I’m turning in to you” business; I’m going to go do my own thing; I’ll send you another report via email next week.

SB: I will tell you that when you get out into the industry, every team is different. I’ve got a consultant’s bias, which means that probably 85 percent of the folks call me because they know something is broken. I haven’t been hurting for finding people to call me to come help. So clearly there is a lot of broken out there.

I don’t want to say that everybody is broken, but I do want to say that two-way trust and communication is frequently enough broken that it’s certainly worth having a conversation about how to get there. Because I think you’re right. If one way or another we build those open lines of communication and that transparency throughout our process, then the need for metrics or for so many metrics diminishes. I won’t say that it goes away, but… Here’s what I can tell you for sure. The project managers that are on the floor, in the cube farm or in the agile space or whatever, with their team most of the time, they don’t have a whole lot of metrics because they just feel what’s going on.

It’s not they feel it and they’re guessing; they see it, they hear what the developers are talking to each other about. They hear the testers and what they’re complaining about. So they’re part of the process. And they don’t need a bunch of numbers to tell the story. They don’t need for you to try to tell the whole story with numbers.

And in other places you’ve got the managers managing six teams. Right now working with Smart Bear I’m the product guy for four products. One team is in Florida, one team in Stockholm, and two teams are in Russia.

Let me tell you, I need some insight. We try to talk and I depend on the conversation more than I depend on the numbers, but I admit I ask them to give me something. Now, we have had conversations where I’ve said, “Hey, start sending me what you’re comfortable with and I’ll ask questions.” So I’m not running trend metrics and that sort of thing. Partially because I’ve been doing this for six weeks and we haven’t figured it all out yet.

JV: Right, you’re just learning.

SB: But my starting point was, “Guys, I need some insight. I can’t talk to you as much as I’d like, so what are we going to do? Okay, what are you comfortable with?” Let’s settle for what you’re comfortable with and I’ll ask questions, and we’ll evolve it. That’s been my approach.

JV: Context. It depends on the context of each project. And then that opening conversation would be how are we going to do this? How are we going to plan on talking about this?

SB: Exactly.

JV: You are also doing a tutorial and I wanted to talk a little about that. Why are some performance testing programs often insufficient to keep pace with expectations and pressures?

SB: I’ll tell you why. Performance testing in so many ways is kind of a black sheep. And security testing kind of falls in the same bucket. Everybody knows it’s important and everybody knows it’s hard. Everybody knows that if you just do everything right from the beginning you’re going to be fine. If you mess up in production once in a while and you fix it quick, you’re probably going to survive.

So everybody is rolling the dice all the time and saying, “I’m not going to invest in this.” Or “I’m only going to invest in part of this.” The challenge that I see is when a company or a team, for the first time especially, is trying to get serious about performance, they say, “We don’t know about performance and load-testing stuff but we know we need to get serious about it.”

Then they do what any normal person would do: ask Uncle Google. And what pops up? A whole bunch of marketing material from vendors.

JV: Oh yeah, welcome to my life.

SB: There’s Smart Bear, right? But yet I will tell you that if you did the same thing for practically anything else, you’d know that the people who are selling stuff are not always the best people to be giving you advice about what you ought to be doing.

JV: You don’t say! (Laughs)

SB: It’s a fact. We get second opinions from doctors; when we shop for a car we look at multiple dealerships; we check their rank. We know a little something about cars. Unfortunately, what happens is we first turn internally and say we’re going to slow down a little and focus on performance. Guess what? It gets better. Because when you take some time and some focus, you can improve things.

When that’s not enough, then what we tend to do is leap to the other side. We say we’ve got this release candidate or we’ve got this problem in production, so what we need to do is do a load test. The way I like to think about it, a load test is basically a production simulation. So let’s simulate production before we go to production, and you can stop what you’re doing.

JV: That makes sense; it seems like a prudent thing to do.

SB: The challenge is that it’s not as easy as it sounds. In most cases, or at least in many cases, it gets kind of expensive, and it’s kind of slow. It’s slow in the sense that as much as we all champion “test early, test often,” I can’t really provide a production simulation on a system that’s not done. Because then what am I simulating? It’s not production. It’s part of production.

JV: It’s a fragment of a production.

SB: Definitely. Exactly. So that tends to be a slow, tedious, and expensive process. So what I talk about is what is it that you can really do, really build in, from wherever you are, and start today. And that’s the key. It’s not start from the beginning or start from the end, it’s start from today. One of the things that everybody can do, and my target is–If everybody on the team intelligently spends five minutes a day doing performance related stuff, we can make a huge difference.

So it’s simple things. A lot of teams these days, agile teams, have their automated build validation tests. Some people call them different things. Did you write the build? Yes or no.

JV: Yeah, that’s pretty common.

SB: Exactly. It’s not that hard, within that same test framework, to stick a little time around some things and ask the question is it returning in the same time, give or take, as it did last time? And if it got slower, did I expect it to get slower? If it got faster, did I expect it to get faster? I’m not necessarily saying pass-fail. In some cases there are things that we know well enough that we can say pass-fail. But in many cases all we can say is same or different. And the important question really is: Different in the way we expected? Or is it different in a way that we didn’t expect?

Some people will tell me you can’t tie in everything. It’s too small, it’s not going to be useful. I’m not saying you have to tie in everything. I’m saying that if the time is the key and you can get it in that test, why not? Five minutes. Because if a developer doesn’t know how to put “Get time, Get time, Subtract output” into their code, I’m thinking they might not be a developer for much longer.

So that’s just one example that we use. Is that ever going to replace the need for your production simulation? No. You know what it is going to do? It’s going to mean that when you do get there, when you decide the release is important enough that we really want to do a full-scale production simulation, we want to be really sure about this one, that when you get to that point with that release candidate—and I use the phrase release candidate because, like I said, I’m serving as a product owner right now. We’re in a team. They ship stuff; they commit every Friday. And I could push it to production every time. But I make the decision… this one’s small. Let’s not go to production this week. Let’s bundle and in the next two weeks we’ll have a big marketing.

It’s sitting out there for production, but we just haven’t turned it on. So when I say release candidate, that’s what I’m talking about. It doesn’t have to be a formal process. There are those times when you say this is going right on through, but we want these two things to come out together so hold off a while.

When you get those that you really want to test, go do your big load test. But here’s what you’re going to find. In your first run, where you’re ramping up that load, where what you used to find was I got just sticks… Before it was all broken. Now, you’re actually going to ramp up for a while. You’re not going to find things like, “Oops! I forgot to point to the production database.” Or “Oops! I left all my debug code in.” Because we’ve done enough along the way that we’ve worked out those really common, I call them whoopsits.

So instead of getting to the end and going, “Oh no, our architecture is broken!” we get to the end and maybe we say, “Maybe it’s good enough. Great!” or maybe “We should do a little tuning.” As opposed to “We need to start over, guys.”

JV: Yeah, that’s a big stumbling block.

SB: That’s really the basis of the talk. And T4 APM is my little…

JV: Yeah, I was going to ask you about that.

SB: So APM is application performance management. It used to be monitoring. Then a couple of years back they decided that monitoring something just tells you it is broke. Maybe we should do something about it if it’s broke. So they changed the word to management.

JV: A little more hands-on.

SB: Exactly. T4 is just a little cycle that I came up with one day while I was talking to a client with a whiteboard. It really stuck. They loved it, other people did, and I use it here. A lot of folks have applied it in other places. It stands for target, test trend, and tune.

Just quickly, target is whatever I’m interested in. Some people could call a target a use case. Some people might call a target resource management. Whatever you are interested in, right? It avoids the whole vocabulary thing.

And the same with test. A test is anything I do to learn something about it. It might be a formal test, it might be an exploratory test, it might be setting up a monitor that is not important to me within this particular cycle.

Trend is where the magic is. If you keep your data and you watch it over time, even if it’s pass-fail data, it’s less interesting than numerical data but it’s still interesting. For example, if you’ve got a bunch of pass-fail tests and you’ve realized that exactly every third build fails, I’m thinking there’s something you can do to resolve that. So you can see patterns in data. Or maybe more specifically what you find is every time this fails, these other three things fail. We don’t think they’re associating. So to me the magic is in the trail.

And of course tune, because I’m a performance tester guy, I’m always thinking about tuning. But it’s the same as bug fixing. Do something about the stuff that you don’t like. You can apply that cycle micro, macro, early, late. Really, it doesn’t matter. It’s just that thought process of let’s boil this down to its most basic components, the really really dead-simple ones, and start there before we get hurt. Let’s build out from simple instead of over-complicating things and stalling because… Wow, if I try to do it all at once, guess what? It really is hard. But if I start with the easy stuff, what you find is that you actually see improvements. More times than you might expect, those improvements are good enough for quite a long time. You can get a whole lot of value if everybody puts in five minutes a day doing some thought-out, intelligent performance stuff. You eliminate a whole lot of unwanted surprises come production time.

JV: All right, this seems like a good place to stop. Thank you very much for taking time out of your day.

SB: You too. Thanks again.

Scott Barber

Chief performance evangelist for SmartBear Scott Barber is a respected leader in the advancement of software testing practices, an industry activist, and load-testing celebrity of sorts. Scott authored several books―Performance Testing Guidance for Web Applications, Beautiful Testing, How to Reduce the Cost of Testing, and Web Load Testing for Dummies―and more than 100 articles and blog posts. Founder/president of PerfTestPlus, Scott co-founded the WOPR, served as director of the AST and CMG, and is a founding member of ISST. His industry writing, speaking, and activism focus on improving the effectiveness and business alignment of software development practices. Learn more about Scott Barber.

Topics:

agile performance monitoring performance testing test management test methodologies test techniques testing tools

About The Author

Jonathan Vanian

Jonathan Vanian has worked for newspapers, websites, and a magazine, and is not as scared of the demise of the written word as others may appear to be. Software and high technology never cease to amaze him.