Wednesday, August 12, 2009

METRIC OF THE MONTH

Whereby I become a skeet….

For those of you who don’t know what a “skeet” is, it’s a clay pigeon flung into the air used as target practice.

‘Metrics” is a dirty word in Testing Land, but as a manager, I believe every competent senior resource out there should be familiar with and able to use metrics for the benefit of their team, themselves, and the company for which they work. Sticking your head in the sand like a flinking ostrich and saying ‘I won’t, I won’t!” merely ensures someone smarter than you are politically will crush you like a grape.

Big Boys and Girls in the business world read and use metrics in a variety of strategic ways and if you need to survive and thrive in the business world, you need to be able to do the same. Again, it is just an immature resource that refuses to do what they need to do to be successful and then complains when they don’t get any respect.

Some of the strategies supported by metrics aren’t very honest. But that’s a personal choice and it doesn’t have to be yours. Metrics are just numbers and nothing more. Without interpretation, they mean absolutely nothing. Used properly, metrics can be useful, interesting, surprising, and illuminating. It depends on what you collect, how, and why.

What I’m going to offer in “Metric of the Month” are those common metrics that I’ve used personally at various companies and that have proven useful in some way.

I’ll tell you how they were captured and used, what made them useful, and under what circumstances they aren’t useful at all. From there, you can make up your own minds as to what might work for you.

This first post is just to lay some ground rules and talk about metrics in general.

First, any type of statistic assumes you have something to count. If you don’t know how many tests you run, you have nothing to count. If you don’t know how many defects you write or note severities/priorities of those bugs, you won’t have anything to count. So if you work in an environment where none of those things matter, nothing in regards to metrics is going to be especially useful to you right now. But I think it’s still a good idea to understand and be able to use metrics just in case you aren’t always working where you’re working now and they become more important to you at some future date.

If you have test ideas, conditions, scripts, cases, or charters written down somewhere, you can collect statistics. Now I’m aware of the arguments that during a given test a human being may make thousands of observations, any one of which may result in a defect report. But one can argue for weeks regarding whether that is just part of the process of running that test, or if every one of those observations should be documented (which would be a daunting task). For myself, I keep things pretty simple. I have a test condition, which is a something I specifically want to examine. I might find something else as I work with that condition, which is an observation and linked to my test. You might feel more comfortable trying to document actions and system reaction. Well, we write test cases as well, but we do it after the testing effort as part of our maintenance process. Regardless, both of us have (some) kind of test idea in (some) kind of format, and that entity (or “test artifact”) can be counted.

Metrics are based on averages. There are times when averages are useful and times when averages are positively stupid. You, as a thinking entity, need to determine what is or is not useful information. One general rule of thumb is that the larger the effort, the more accurate and meaningful metrics become.

Consider, for example, a small, fiendishly complex change. You may only have two tests for it, and you might find one minor error. That equates to a 50% error rate. But what does that number really tell you? Nothing. And if you blend that 50% error rate into another error rate for a total error rate, the 50% might drag your numbers into the toilet. In other words, it might end up painting an inaccurate picture. If, however, you have 1000 tests for that change and you find 500 errors, that 50% error rate paints a more accurate picture of how things stand overall.

Another general rule of thumb is that metrics are useless for determining individual worth (or lack thereof). People are smart enough to skew numbers to their own advantage, and you’ll end up encouraging Very Bad Behaviors if you try it.

Consider a testing staff “judged” by the number of defects they write. Oh, you’ll get LOTS of defects written. Good luck in getting all of the bogus, duplicate, and picayune enhancement defects out of your logs, however. Or dealing with the Major Drama that ensues when something good is submitted for test and even your most talented tester simply can’t find much wrong with it. Oddly enough, if you go out RIGHT NOW and take a look at your defect logs, counting who wrote how many bugs, you’ll probably find the numbers line up pretty much in the same order as you suspect in terms of testing talent on your team. Good testers DO inevitably find more bugs. But all of that will change if you start using metrics to “judge” the merits of your team. Some of the worst will end up looking like your best, and some of your best (if they even stick around) will look bad in comparison.

The same types of things happen when numbers are used “against” development staff and they are evaluated according to how many bugs manifest in their code. First of all, you might give your best developer the most complex code, which by its very nature will manifest more errors. And you might give them 4 times the amount of work as a less experienced resource. Evaluating development staff according to the number of errors found in their code is a huge mistake; no one will ever want to work on anything even mildly complex, and you’ll get kickback on every single error that is documented. The development team will hate you. YOU will hate you. Even your dog will hate you, just on general principles.

So I’d recommend avoiding what I call “Evil Metrics” and do everything possible to avoid quantifying individual goodness with numbers. No matter how much you might hate dealing with “touchy-feely” stuff, people are not “data”.

I’m going to start this series with the type of metrics we customarily pull after a major code migration and work down into layers of complexity. This month, it’s going to be the percentage of error in production; look for the first blog in the series during the next week or so. My Kevlar vest hasn’t arrived yet and the target I’ve painted on my back isn’t dry yet…

3 comments:

Simon Morley said...

Pull!!!! (To get the skeet to fly...)

Looking forward to the posts - not for the target practice(!) but because I'm genuinely interested in how metrics are used and communicated.

I collect a fair amount of metrics but only try to communicate an interpretation or judgement call - it's too easy for people (manager's) to get fixated by the raw data. If anyone wants to know the reasoning/back up for the judgement then we can discuss the data.

Statistics are extremely useful but need some education/experience in their use. The saying "Lies, damned lies and statistics," has an element of truth! Handle with care!

Linda Wilkinson said...

Simon, I'm definitely going to shout "Pull!!" every time I post on metrics; I might even hide under my desk and not look at my mail for a few days afterwards...

Then again, controversy is good; it stimulates the brain cells and improves sales of adult beverages.

I agree with your comments. I couldn't get away with communicating nothing but interpretations in my present position; my executive managers (VP/CIO/CTO) love nitty-gritty detail. And lots of graphs.

- Linda

Joe said...

Thank you.

I look forward to seeing your discussions of non-evil metrics.