So in the last couple weeks, Metrics have become a candidate for the 7th plague.
Good God, y’all.
It rarely bothers me when someone disagrees with me. I actually enjoy debating and talking about our field. Sometimes after a really great discussion, I find I was wrong. That’s part of what makes it all so much fun. But I have pretty strong feelings in regards to doing what you need to do to make yourself, your team, and your company successful and that does not include being stubborn about metrics. Overall what I feel in regards to fear and loathing of metrics is…pity. I hate seeing extraordinary talent limit themselves. And avoiding metrics can be a career-limiting move, particularly if management is your goal.
So it’s up to you – willing to listen and maybe add a few tools to your tool belt? You don’t HAVE to use every tool you’ve got, just be comfortable enough to use them when you need them. Yes? Great - then let’s take a look at some metrics that can Do Good Things for you, your team, and your company.
There are a bizillion different types of metrics out there and I’m going to start with one of the most common – percentage of error. It can be the overall percentage of error, the error in a given environment (usually production), or the percentage of error for a given project/product effort. I’ll start out with the most basic concept and we’ll slice and dice it down into a few commonplace sets of numbers.
Most of the “useful” metrics I’ve seen are pulled to help answer a question or help diagnose a problem. They can also be used as a way to monitor progress; we’ll talk about that later.
When you’re interested in the percentage of error overall or in a given environment, generally it is because you have questions or problems such as:
1. There have been an unusually high number of problems reported by your clients
(whether those are internal or external) and you’re trying to figure out what
happened;
2. People are complaining that the test team is missing too many errors and you’re
not sure what is true or where to focus your efforts;
3. Either you or another group has been working hard on doing (something) better
and you’d like to see if it made a difference;
4. Sometimes the company is happy with the test team’s efforts and sometimes it
isn’t. You’d like to “take a temperature” after each effort to get ahead of the
game and help figure out how to provide more consistent service;
5. You always have issues with the same project team or product line in
production and you’d like to figure out if what you perceive is an issue is, in
fact, an issue. If you can see the same trends or problems numerically, you’d
like to present the information to the PM and executive managers in a way that
will encourage them to take action.
These are just a few examples of the kinds of things that can at least be partially addressed by pulling and analyzing some numbers.
***DANGER…DANGER, WILL ROBINSON….***
Only you or your team (management or other) can decide if a number is “good” or “bad”. Numbers have to be interpreted intelligently. In this case, “percentage of error” is, in and of itself, meaningless. A company’s tolerance for error varies widely between industries. And even corporate culture impacts what is considered “good” or “bad”. It’s for this reason that trying to compare your own company to an “industry standard” can be such a bag of worms. No two companies are identical, so you may be trying to compare yourself to an organization ten times your size, with ten times your budget, etc. Industry standards can be interesting and can even be used to help establish a goal (“something to shoot for”), but the enormous number of types and sheer volume of companies that feed the averages that make up any “standard” means it’s not really a standard FOR YOUR COMPANY. Or ANY OTHER COMPANY. It’s nothing but a ginormous average made up of data from a lot of companies.
For that reason, I believe it is far more valuable to you and to your company to determine and establish your own “standard”.
So let’s start figuring out where you stand, statistically speaking, right now.
There are several ways to determine the percentage of error, either overall or in a given environment.
The first method, which I prefer, requires that you know how many tests (or conditions or scripts or charters - you get the idea) were run and how many errors were found. We’ll start with the most basic number – the percentage of error found across all environments. By the way, if you’re counting all of the errors found throughout the process, including production, you’re going to have to figure out when to stop counting. In our case, I reviewed all of the errors found in production for 4 weeks and found the number of problems reported due to new software installs dropped off dramatically after two weeks. So my cutoff point, at this company, is 2 weeks after new/changed code is migrated to production.
The formula is:
(# errors found)/(#tests run)*100
For example, if 50 errors were found and 200 tests were run, you would have:
50/200*100 = 25%
So your overall error rate is 25%. Is that good or bad? Neither. It’s a number. If your clients call to complain in droves, die, or your boss demands your head on a platter, I’d say it’s “bad”. If you get a board commendation, an invitation to play golf with the CEO, and your coworkers carry you around the building cheering and chanting your name, I’d say that’s “good”.
But you now have a baseline, which through analysis and interpretation, you’ve determined is good or bad for your own environment.
So let’s slice and dice this data further.
Say you’d like to know, of that overall percentage, what was found in the test environment by your testing team and how much was found by your clients in production.
The formula is the same, but you use the number of errors found in each environment.
For example, say 10 errors were found by your test team in the test environment. 40 errors were found by clients in production. The math looks like this:
10/200*100 = 5% (found by your test team)
40/200*100 = 20% (found by your clients)
As a manager, I’d be interested in why our clients were finding more error than my test team. And this is one of the big benefits of taking metrics. This number would raise questions in my mind and I would go ask them. Metrics can be of benefit to point out something unusual and spawn questions that need to be asked. The above situation might not be an issue, or might not be a serious issue. Perhaps your clients are expert and vocal, and the 20% are primarily enhancements they’d like to request. And perhaps it IS a problem with your testing efforts. Regardless, finding answers to questions raised through even the most simple metrics can require a great deal of investigative analysis.
Let’s take these numbers further. Perhaps you’d like to know what the percentage of error is by function. You can do this two ways. If you use the same formula as above, it would be:
(# errors found in X function)/(total # tests run)*100
Notice there is some consistency here. If you’re going to pull metrics, I’d suggest deciding on one way to get the information you need and sticking to it consistently.
There are other ways to determine the error rate of a given function, but the above method tells you the percentage of error for a function when considered for the entire testing effort.
You could also use the following math:
(# errors found in X function)/(total # tests run for X function)*100
This doesn’t tell you how X function fared in comparison with other functions tested during the test effort, but if you’re uninterested in indicators that show you (for example) that you always find more error in function A than function B, then the above formula is fine.
If your questions involve curiosity as to where MOST of your errors were found, environment A, B, or C, you don’t even need to know how many tests were run. You can simply use the number of errors found in this way:
(# errors found in X environment)/(total # of errors across all environments)* 100
Tired of generic match yet? Then let's move on. What can you DO with this stuff? (now, now – be polite).
I can only tell you how the companies for which I’ve worked have used them. When I started working in my current company, one of the problems they wanted to solve was “too much error in production”. The business users were complaining. The testing group was viewed as barely competent. The IT group was viewed as inadequate. Every migration was a debacle, with roll-outs, emergency patches, and the like.
So we took a “percentage of error” baseline. The results? Our error rate was over 49%. The business users found more error in production than we did.
So we dug down deeper, and found out why. I won’t go into the many issues that fed this problem, but we put together a game plan and made a proposal to management, using our error percentages as baselines and making suggestions specifically to address “bringing the error rate down”. I'd like to make the point here that one of the reasons our proposal was accepted and action taken was because our initial numbers were accepted, recognized as less than ideal (boy howdy), and were, as just numbers, a non-offensive way to get a point across. One of the reasons executive management likes numbers is the lack of emotionalism involved.
Our error rate today is between 3-4%. We are respected. The IT organization is respected. We haven’t had to roll back or interrupt service for emergency patches. My test team kicks butt.
None of that happened overnight. It took a lot of change, intelligently implemented, over time. We had the cooperation and support of executive management, development. operations, and architecture.
Metrics made those things possible and gave us a goal, “something to shoot for”, and a way to measure our progress. And it did that in a way that was not confrontational, emotional, or accusatory. Numbers are not emotional, confrontational, or accusatory. They’re just numbers. For every migration, we pulled the same metrics, in the same way, to determine if the changes we’d made seemed to make a difference. So we could see if our numbers were trending up or down. Without something like metrics, we'd just be going on our "feelings" and "opinions". There's nothing wrong with feelings and opinions. But in my experience, it's really tough to get funding based on either one.
At this time, our regression test case base has become big enough that pulling the overall error rate is no longer useful for us. Consider that if you have 100,000 tests and a 10% error rate, that’s 10000 errors. Is 10,000 errors “good”? Probably not. So I’d say that once your regression test case base becomes significant, you’d want to move to pulling metrics on an application or functional basis.
And that, to my mind, is another “rule of metrics”. If they aren’t doing anything useful for you, throw them away. Once you make progress like we did, you don’t need to prove, over and over again, that your error rate is low. I’d focus on an area that needs some improvement or investigation.
And I never get involved in Evil Metrics. From the above numbers, you can extrapolate that it would be easy to determine errors associated with a module or an individual’s work. If asked to do so, I gently refuse. I have refused. Many times. You cannot judge a developer by the number of errors that manifest in their code. They may have double the workload of the rest of the team or the most complex parts of the system to work on. The furthest level of “slicing and dicing” I do is down to a function level; one level above any chunk of code that can be attributed to a single developer.
If you’ve never done this before, why not try it one time? You don’t have to show anyone or use the information, but why not find out where you stand at the moment and get some practice in? Many people want to know what their numbers “mean”, and really, it depends on your company. There are several “industry standards” out there. I can only tell you the Linda Wilkinson industry standard. I’ve handled more than 250 projects, from retail, .com, to banking and aviation. What I’ve found is that if my error rate in production is over 20%, it’s likely that I have something messy on my hands that will cause my end users some pain. If I have less than 10%, the migration is going to “stick”. But those are based on personal experience. All it means is that if I see a number over 20%, I’m going to investigate the whys and the wherefores and see if there’s something we can do to make things better.
And again, I'm well aware that even one error in production, if it's a bad one, can be heinous. But that's NOT WHAT METRICS ARE FOR. Metrics do not really indicate exceptions or individual cases of either goodness or badness. They are merely averages and are useful as overall indicators. If you pull any one single instance of anything, averages do not apply. If you have an extremely small base of tests, like 2, averages will not be of benefit. If you don't have a clue as to how many tests you run, then obviously this set of metrics is useless to you. But that doesn't mean they're useless to everyone or useless in general. They just don't apply to you right now. Someday, they might come in handy, particularly if you need to figure out where you are, establish a baseline, set some goals, or make presentations to executive managers in a way they can understand and accept.
I’m going to talk about defect statistics in my next ‘Metric of the Month”. When I end this series, I’m going to post a copy of our metrics report, and probably wrap up with how to refute metrics that are bogus. Once you know how to run them, you also need to know how to unravel them. If your career follows the same path as mine, you’ll need both skills….
OK, everyone. Target practice is officially open!!!! Pull!!!!!
Tuesday, September 15, 2009
Subscribe to:
Posts (Atom)
