Wednesday, March 3, 2010

DOOM, DESPAIR, AND AGONY ON ME....

Doom, Despair, and Agony on me -
Deep Down Depression, Excessive Misery!
If it weren't for Bad Luck,
I'd have no luck at all....
Doom, Despair, and Agony on me!


- Hee Haw

Long Time No Blog. Miss me? Hope you can sense I'm smiling; I can hear some of your comments from here...

On occasion, Life interferes with Art. I had a very serious medical issue - the past 4 months have pretty much been filled with fear, pain, and a lot of Yucky Stuff I'll spare all of you. Having never had surgery for as much as a wart before, it has been Quite the Experience.

I'm happy to say I've only missed a total of 10 days of work, and even that I spent "plugged in" via BB (Blackberry), but I haven't had the time or energy for much else.

Like many people in my position, having a medical issue means the Vultures Have Been Circling and besides facing the reality of my own mortality, I have a new appreciation for the depth of callousness of that permeates some Corporate Mindsets. I'd heard stories, of course, but had never experienced it first hand. It has been an experience I could have happily done without.

But I've survived. A bit battered and considerably more jaded, but I've survived. And I'm fine.

The experience has changed me somewhat, however. First of all, the friendship, support, and caring of perfect strangers has blown me away. On the other hand, some people I cared about and trusted acted like absolute pigs. It was confusing, to say the least.

I did keep up with reading blogs, etc., but found a lot of what I've read has been same-old, same-old and generally uninspiring. The same self-promoting groups and people keep self-promoting. Blogs are written about horeses that were flogged to death years ago. I spent some serious time contemplating whether it was worth my time to be involved any more, and what I really care about.

Well, overall, I find I haven't changed THAT much. I care, in essence, about testing. Not politics or affiliations. I still find that people who care more about themselves than their contributions to the field, their coworkers, peers, or staff are an offense to the nostrils. I'm still bored (and occasionally depressed) by people who jump on bandwagons with no experience or understanding of what they're touting; they just want to be on the "good list"; it's a whole lot safer.

I read the article on the ten top women testers during this time and all I could think was that we women needed to work a whole lot harder. All had contributions that were worthy of note, but tell me the truth - how many names did you recognize? And how many were TESTERS? If I had written that article about men, the hardest part of the piece would be picking only ten. And you'd recognize their names. I thought it was a sad statement on our field. The best woman tester I've ever known is someone you wouldn't know either. But I've known thousands of testers, and she's the best I've ever worked with; she's scary and her skill sets border on magic. And she's a tester; not an author, speaker, or manager. Her name is Carrie Smith. So to all the Carrie Smiths out there, a tip of the hat, a curtsey, and an "I'm not worthy" under the breath. Because we AREN'T worthy if we can't be honest, leave our prejudices and affiliations out it, and offer everyone something we know, from the depths of our hearts, to be the truth as we know it.

So, overall, is it worth it, really worth it, to bother to try any more? To fling my professional, personal, and occasionally irreverant thoughts out into the blogosphere?

Well, why not? I've never tracked my readers - there might be 2 and there might be 200,000. I write like I'm talking to a friend; I write plenty of impressive reports like PhD dissertations and it's not much fun; blogging should be fun.

I figure some of you MUST be like me. Years of slogging in the trenches. Some of you MUST be practical people in charge of QA, QC, or testing, surviving the corporate jungle on a daily basis and doing good things for your company and your people.

So I'm back, and I'm re-dedicating my blog to all of the Real Heroes out there, doing your job and kicking ass, day after day.

And to start things off for what is MY new year, here's a problem I'm dealing with right now and what I'm doing to solve it. Maybe you've faced the same problem and have come up with other solutions - I'm hereby inviting you to tell us about it.

When I started here 5 years ago, there were 365 test cases for a ginormous system. The tests sucked and most were useless. The error rate in production was 49%. No one (staff of 4) had any test training and they were working (and being treated) like dogs.

So we trained people. We got more people. We worked hard on analysis and coverage. We have respect.

There is no impact analysis here and even the programmers have no idea how their code affects other code. And the system is highly inter-related. So the most obvious way to cut error rates was through coverage. Our error rate in production is now under 4%.

So far, so good.

But with maturity has come other problems. We now have over 17,000 test cases. While my staff has grown, we aren't a Microsoft shop, and we can't afford to constantly be adding staff to handle the ever-expanding regression test case base. And that base doesn't find the bulk of the errors - tests for the "new stuff" finds the bulk of the errors. But due to the nature of our business, which involves human life, it's critically important to ensure existing functionality still functions. Because everything is so inter-related, a single error in one function can rock 22 other functions and render our systems useless.

At the same time, it's obvious we need to pull back and be smarter about what tests get run during a given test period. My company simply cannot sustain the personnel growth associated with running every test every time.

Add into that mix an offshore team of ten,, who need stepped-out test cases to operate effectively. It takes one of my analysts an hour to step out a test case.

So what does the solution look like? Well, if you have a batch of regression test cases you've either written or run,, don't you find you pretty much know what the test does by the title? We do. So we added a field to the test case, went through the titles, which was fast, and ranked each test according to criticality. Not rocket science, right? A kind of risk-based endeavor. For a ranking of one, if the test case failed it means the business user would be unable to work. For a two, the business user would be strongly inconvenienced and want a patch. For a 3, an error would be minor and in all likelihood would be deferred to a later release or patch.

We've discussed it with upper management, and we're going to drop the regression tests ranked as a "3" for this release. We'll closely track error levels in production and report back results using our standard metrics. Yes, we have standard metrics. They come in handy for improvement exercises like this.

If there's little or no difference, which is what we expect, AND our business users are satisfied, we'll whittle back the 2s. We'll do that cooperatively with the business users and development. We're also gong to check the 2s against our defect log to see how many errors were found in the past few years with those tests. This will help ensure we don't 'retire" any tests touching code that has been historically problematic.

Any of the tests we "retire" for a given migration may be acive for the next - it depends on what is being changed. And the development/business users folks will see and approve the list.

All "new" stuff will be tested as usual.

What I'm working towards here is changing the nature of the work so that all tests are analyzed and intelligent decisions made as to what needs to be covered for each migration. Not every member of my staff is that evolved. We're going to grow into it.

Eventually, I want a "robust" smoke test to replace the ginormous regression test case base. Let me explain what I mean. If my staff have to test an OS upgrade, we can do that in 4-5 days. If we're testing a standard migration, it takes 5-6 weeks. In essence, my staff "know", through experience, what is and is not "important". We're formalizing that knowledge through our rankings and gradually reducing the test bank until we hit the sweet spot. That is, just enough tests (regression) to get the job done without impacting our operational excellence.

We're going to eventually end up with an intelligent tour of a given piece of functionality, complete with notes and comments about where/how functions interact. My apologies to founders and proponents of the "tour" concepts. We're backing into it by degrees.

You know the old saying that it takes longer to turn a cruise ship than a small speedboat? Well, that's true. I'm estimating it will take us a year - I have to prove each step hasn't negatively impacted production, change my staff's modus operandi, and get the rest of the company on board with us, to say nothing of keeping upper management happy.

I believe, however, the results will be the ability to handle more "new stuff" with current staff, and to keep the need for additional staff just to handle the ever-increasing regression test base to a minimum. Just that much alone would save us around 400K per year. Chicken feed to some of you, I know. But it's a major Chunk of Change to us.

In addition, I think the majority of my staff will enjoy the additional analysis work and responsibility, particularly if it means cutting back repetitive tests that bore the snot out of them anyway.

So that's the issue and our solution in a nutshell - if you have other/better ideas, feel free to drop me a line and I'll certainly give you the airspace!

My next blog is going to talk about cost/benefit analysis for automation. This is something I've done many times for many companies (and the final recommendations aren't always what you think), but I'm going to talk about where we are in my present situation and the questions we're being asked by upper management.

And it's nice to be back. Now that I think of it, it's nice to anywhere!

Tuesday, October 27, 2009

LINDA WRITES A FAIRY TALE…

For those of you that don’t know, I’ve contributed a chapter to the upcoming book “Beautiful Testing” along with a bunch of other people you all probably know pretty well. It was an interesting effort, since contributors were all over the map in terms of their perceptions of the industry, and what I liked about the effort was that all of the proceeds go to charity. So everyone put their differences aside and did Something Good to benefit someone other than themselves. I must report to you, however, that writing for a book of this kind is somewhat painful. Besides the effort to write the chapter, we all had pretty stringent deadlines, several reviews by the editors, specific formats, etc. One of the final steps has been having a handful of official reviewers of the overall book; two developers and two non-developers.

The developers had quite a bit to say about my chapter.

Ahem.

One said they really wanted to read a story about someone who didn’t have the skills of a good tester and how they acquired them (you know, something “real”), and the other was appalled by my unabashed enjoyment of working on a SWAT team with Attitude. The non-developers had only good things to say…

Well, after giving it a lot of thought, like for ten or twelve seconds, I decided that while I was not going to change my book chapter, I would write a blog entry that gave both of the development reviewers what they really wanted.

A fairy tale.

I hope you’ll all bear with me. I’m not too experienced with works of fiction.

Once upon a time, there was a programmer named Dufus. Dufus wasn’t a very good programmer. He dropped objects he didn’t understand into his code just like all of his little friends, but his code was very very sick and he wasn’t smart enough to make it well again. The testers were mean to him. His boss was mean to him. The other programmers made fun of him.

Dufus was very very sad.

One day, Dufus’s boss, Mr. Dunderhead, decided that Dufus wasn’t a very good programmer, so it was likely he would be a very good tester. After all, testers aren’t really responsible for anything, otherwise they’d be business users, project managers, or programmers.

So Dufus joined the testing team.

Now since Dufus wasn’t the brightest bulb in the pack, reading specifications, or anything else, were not some of his very Most Favorite Activities. He didn’t like to ask questions about how things should work, especially from his old buddies in development, or those scary, weird business users, for fear of looking stupid. And he didn’t like or trust his new team, since they didn’t seem to get the zeitgeist of “The Farmers and the Cowboys Should Be Friends” (yee-haw). They seemed, well, CRITICAL of the software produced by his old team.

After the code he tested and said was OK broke production 47 times, his kindly new boss, Black Bart, explained to him that he was obviously a “test tool” and set him to executing the step-by step test cases written by his colorful and irreverent crew. It took a while for Dufus to understand him, as he didn’t understand the lingo, which appeared to be interspersed with dozens of ways to say the same thing. Dufus had difficulty understanding even one way. Eventually, however, after the boss’s trained parrot repeated it enough times, and he was slapped repeatedly by his test lead’s winged monkeys, Dufus got the idea.

So Dufus officially became a Tool.

However, as Dufus wasn’t particularly observant OR much of a reader, and since he had a lot of experience with code, he might or might not run a given test, depending on whether he thought it might break something. And he really didn’t want to break anything; his old buddies didn’t like it much, and even when he did ask them about it, what he had found was never really a problem anyway. Dufus was not surprised. Everyone knows development rarely makes mistakes.

After his second try, when the code he tested and said was OK broke production 62 times, his new boss kindly drilled him a brand, spanking-new orifice, since it appeared Dufus couldn’t listen very well with his existing orifices, and his testing approach seemed somewhat constipated.

But his new orifice was kind of painful, his new teammates, although they respectfully referred to him using his new title “The Tool”, didn’t seem to like him much, and his old friends were still making fun of him, but not in the same way. The winged monkeys were giving him the finger whenever they thought he wasn’t looking. And he didn’t like the fish-eye he was getting from the parrot.

Dufus was very very sad.

One day, while sitting disconsolately in his little cube, staring without comprehension at his little PC, he heard a roaring sound outside his cube. Some tattooed, leather-clad dude on a Harley went flying by and tossed a bunch of pearls on his desk.

Dufus grunted.

What were these pearls? They looked good. They smelled good.

They must be Magic Pearls!!!!

So Dufus ate them.

Clutching his stomach in pain, Dufus made a run for the restroom. He barely made it. To his surprise, he yakked up a Giant Hairball of test ideas! And suddenly he felt, well, he felt GOOD. He tidied up the ideas and presented them to his boss, Black Bart, who thoughtfully fingered his gold earring and said “Arrh”….

The parrot hopped onto Dufus’s shoulder and affectionately bit his ear.

Dufus inexplicably found himself naming his goldfish “Aristotle” and using his here-to-fore unread specifications for more than just a paperweight. He signed up for the Oxford English Dictionary on-line. He had questions about everything. During the next testing release, he wrote 192 defect reports and made his old manager, Mr. Dunderhead, sob openly into his frayed polo shirt. Production didn’t break at all. The business users, Lola and Lily, thanked him in ways he’d only seen on the Playboy Channel. His new team took him to lunch. The winged monkeys made him coffee. He became smarter, stronger, and better-looking. Dufus was very very happy.

And They All Lived A Quality Life Happily Ever After.

The End.

The moral of my story?

Can’t you guess?

“The Only Way a Dufus Can Become a Good Tester is By Magic.”

All, I hope you enjoyed my Very First fairy tale and I hope you’ll read the book and see what made the developers feel, well, just a tad uncomfortable… If my particular perspective doesn’t do it for you, there are many talented people in the field who have contributed to “Beautiful Testing” and the proceeds go to a very good cause, which is cure/prevention of malaria. My off-shore lead had to recover from a combined bout of malaria and typhoid, so this cause is closer than one might think. I haven’t read everyone’s contribution as yet, but as a group we couldn’t be more different; I expect it to be interesting. The experience certainly was interesting – we had Politics, we had Disagreements, and we had Drama. I’m thinking that probably the only thing we could all wholeheartedly agree to is that we never want to share a hotel room.

I’m Chapter One, and it’s called “Was It Good For You?”. I hope you’ll consider buying a book; none of us are getting a dime out of it and it’s for a very good cause. You can pre-order through Amazon and the book comes out on Friday, October 30...

Enjoy.

Friday, September 25, 2009

EVOLUTIONAL TESTING

It’s time to move on….

Lies, crap, and software testing. I’m Totally Over the drama queen paranoia and finger-pointing. Then again, it’s been a while since we’ve wallowed in this kind of engineered soap opera, so I guess it was overdue.

The blogs lately have been heinous. I think a few people got their nose hairs tweaked because everyone isn’t totally excited about “checks”. To top it off, someone wrote a pretty good book that used the word “exploratory” and wasn’t blessed by The King. No, not Elvis Presley. Elvis has checked out of the test lab.

Truly, these have got to be the least exciting excuses for a turf war I’ve ever seen.

I’m not sure I recognize that there is only one “exploratory testing group”. The testing field is not a monarchy. There is no Supreme Ruler who gets to decide who is “in” and who is “out”. There is no Heidi Klum of testing (sorry, guys).

Does the exploratory testing approach “belong” to anyone? Before it was exploratory testing it was called ad hoc testing. It existed before it was renamed.

My own thought is that no one owns this field, or even a piece of it. How can anyone believe the only worthwhile ideas in the past 20 years belong to one person or those they have blessed? Software testing isn’t a papacy either, and I don’t know about you, but I’m not kissing any rings. No one has the right to say person X, Y, or Z doesn’t have a clue because what they say is not what one person has either said or endorsed.

As far as the checks go, no one is “required” to care about the work or discussions going on in any one particular group and everyone is free to either support or not support whatever they choose. They have the right to say “that is not applicable to me” or “we don’t do things that way”. At the same time, no discussions need to be “shut down” because someone else isn’t interested in them. Have some balls. If it interests you, say so. Get involved. Go for it. If it doesn’t, you have a right to say so and move on.

In my case, I’ve been doing this a long time. Sometimes I forget everyone else hasn’t been doing this for as long as I have. That will be true of others as well. Sometimes people will be uninterested in an idea or concept because they’re way ahead of where you are, conceptually speaking. That doesn’t mean you aren’t going to surpass them. It means you have to catch up first. THEN you can pass them by, waving cheerfully. And the tables will turn and they’ll be behind you. Learning everything they can from you.

I’ve also observed that it doesn’t really matter whether a given idea has merit or not; some people fall in love with their own ideas and won’t give them up no matter what. They can pursue red herrings for years. So what? It’s their dime. Use their ideas or not; only YOU can decide what makes sense for you and your environment. There are also people who, I might add, are distributed equally between “exploratory” and “other” groups, that are so locked into the way THEY do things that they are completely incapable of trying anything another way or recognizing anything of value outside their own little world. Again, so what? Those kind of people exist in EVERY industry. Shrug your shoulders and move on. Be thankful you aren’t similarly handicapped.

I see people I respect bemoaning the lack of progress in the last 20 years, while they inadvertently feed their own perceived stasis. How can the field “progress” if only a small group of people are “allowed” to move it forward? The reality is that everyone has and is moving forward. Some can just recognize those changes more readily than others.

And why the sudden concentration on the 70s if you want to be insulting? In the 70s, testing in my neck of the woods consisted of development staff desk-checking code. The changes some people are trying hard to disparage occurred on a more wide-spread basis in the 80s. Regardless, what difference does it make when something started? Wasn’t the term “exploratory testing” coined 20 years ago? Was that crap too? And wasn’t the highly-respected person who coined it also an “academic”? Ah. Some academics are OK and some are not, is that it? Isn’t that a bit, well, hypocritical?

This is the kind of stuff I’m reading. And it leaves me totally cold. It’s a particularly revolting display of the problems with our field in general.

Can’t we leave that kind of crap behind? I think it’s time to open things up. Including our minds.

Evolution is an interesting thing. You can start it, you can contribute to it, but you can’t always predict how things will turn out. Things can evolve around you, and surpass you. You can’t always direct evolution. Sometimes it explodes and divides in unexpected ways.

It’s time to move on.

I want the freedom, and I CLAIM the freedom, to incorporate whatever I ideas I choose into my work. I don’t need anyone’s “permission” to call it whatever I choose. I don’t really care if your idea of exploratory testing doesn’t match that person’s over there. So what? Maybe you both have ideas that would useful for others. Maybe they’re both flavors of exploratory testing. MAYBE THE FIELD IS EVOLVING.

I am going to continue to use those ideas I find useful and discard those I don’t. It doesn’t matter to me if they’re approaches, methodologies, or techniques. What I’ve found, for myself, is that I’m most interested in techniques. An approach doesn’t mean much if you have no way to implement or support it. But you have to have some concept of approach first and there are others more interested in defining high-level approach. They’re the “idea men”. It’s up to everyone else to figure out how to make it work. And that’s OK. Regardless, when I’m interested in an idea, I don’t care if they’re blessed or pagan. I am going to do whatever makes sense in the environment I work in. Those of you totally locked in to your own little world, be it structured or exploratory, are going to be eating the dust of people like me. Is that what you want? Some of us don’t recognize any boundaries. We’re free to appreciate and use it all, or ditch whatever doesn’t make sense. We can think our own thoughts without asking permission, discuss and work on whatever we choose, and call whatever we’re doing whatever we want to call it.

I believe I’m going to call my own approach “Evolutional Testing”. That way I won’t step on any hyper-sensitive toes. And by the way, feel free to use that term without building me a shrine or bowing to the east whenever you use it. It means I will use whatever approaches, methodologies, and techniques make sense for what I’m doing and I’ll evolve as necessary to do the work more successfully, efficiently, and cost-effectively. Right now, I use ideas that are 25 years old. I use ideas that are 6 months old. And everything in-between. I’ve been helped or influenced by more people than I could say. I’ve developed my own techniques for a variety of issues I face every day. Altogether, I can see how I’ve evolved during the course of my career – as a tester, a person, and a manager. If all were right with the world, that would be what “contextual” testing meant. But I don’t meet many contextual testers that are actually contextual. Do you?

There isn’t anyone in this field I consider a “rival”. I’ll be perfectly happy for you if you succeed and you’re welcome to be or become Much Bigger than I am. I’m free to appreciate what every one of you brings to the table. I am happy with what I’ve accomplished personally and know I have contributed value to this field, whether it’s on a big scale or a little scale. What’s more, I feel YOU provide value to this field.

So take that knowledge and confidence in yourself and do something great with it. Why not absorb it all? Take it all in, play around with it, stick holes in it, add stuff to it, and mold it into what you need it to be. Share your thoughts, good and bad, with the field. Make yourself better. Make the field better. But I sincerely hope you do not allow anyone to limit you in what you choose to do. No one and everyone in this field “owns” testing, exploratory or other. Grab it and run with it. EVOLVE.

And the on-going soap opera? Well, let’s sit down in a comfy chair, get ourselves an adult beverage, and throw peanuts at our PC screens...

Monday, September 21, 2009

THAT’S THE WAY (UH-HUH, UH-HUH) I LIKE IT

Oh, hi, everyone. Excuse me whilst I remove my platform shoes.

More stuff on the differences between manual testing and checking? Argh. I wrote an entire blog (which was brilliant by the way – sorry you won’t be able to read it) on the topic, but decided not to post it as I wrote it. Damn Michael Bolton and James Bach anyway. If you haven't read their blog entries on this topic, you'll need to do that first, or this won't make much sense.

After reading their blogs, I sat around and thought about it for a while and decided my response was knee-jerk.

I understand the concept. It’s not hard to understand. I believe it likely James Whittaker and everyone else understands it too. One of the most offensive (and repetitive) statements made by James Bach infer that people who disagree with him don't understand him. In other words "anyone who disagrees with me is stupid".

Well, I understand, all right. I’m just not sure I care. Maybe that's how Mr. Bach's "rivals" feel too.

But I’ll make a concession.

There IS a difference, when contemplating manual testing, between what one might call “sapient” tests, which are being done for the first time and require concentration, analysis, and intelligence, and “rote” tests that have been repeated multiple times and where you are verifying a system reaction has been unaffected by changes or new code. The latter takes less attention than the former. The latter can be very boring. We try to automate the latter. We give the latter to people new to the function. We cut the latter where we can. So I concede that I have different strategies for handling these two categories as they have defined them. Same thing with automated testing.

But here’s the thing. I already have terms for those differences and I already have different strategies for handling them. But if this is the first time some people have ever recognized those differences, well, what can I say? This is old stuff to me, with new terminology. If it helps some bright people come up with new and better ways to handle “checks”, new techniques that cut time, resources, or costs without throwing away the knowledge and NECESSARY checks those “rote” tests provide, I will concede it was a Good Thing.

I do want to say however, that I’ve noticed that when the pickings get slim, someone who hasn’t thought of anything new for a while picks up something old and tries to MAKE it new. I can’t help wondering if it was either “checks” or comparing testing to raising ferrets.

I have to say that my mind isn’t blown. In fact, I’m kind of bored with this particular topic as it stands right now. And just as an inconsequential aside, I don’t think an on-line account to the OED (Oxford English Dictionary) is particularly “sweet” either. I think “free” is sweet.

Other (no doubt lesser) dictionaries define “manual” as involving and using human effort, skill, power, energy, etc. Or “done by hand as opposed to machine”. Overall, manual testing does not evoke the concept of unskilled physical labor. I encourage you guys to come on down from that ivory tower and join the hoi polloi. Those kinds of statements make me think you’re out of touch.

I’d also like to make the point that intelligence and sapience are two different things. You can be the brightest bulb in the pack and still continue to plug into the wrong sockets. Sapience requires some common sense.

So anyway, I’m “bracing myself for insight”. Maybe some will be forthcoming soon. At least some solutions to the issues of minimizing time/money/resources spent on “checks” would provide some benefit to the field.

Now, if you'll excuse me, I'm going to get my disco ball going...

Rock on.

Tuesday, September 15, 2009

METRIC OF THE MONTH – ERROR RATES

So in the last couple weeks, Metrics have become a candidate for the 7th plague.

Good God, y’all.

It rarely bothers me when someone disagrees with me. I actually enjoy debating and talking about our field. Sometimes after a really great discussion, I find I was wrong. That’s part of what makes it all so much fun. But I have pretty strong feelings in regards to doing what you need to do to make yourself, your team, and your company successful and that does not include being stubborn about metrics. Overall what I feel in regards to fear and loathing of metrics is…pity. I hate seeing extraordinary talent limit themselves. And avoiding metrics can be a career-limiting move, particularly if management is your goal.

So it’s up to you – willing to listen and maybe add a few tools to your tool belt? You don’t HAVE to use every tool you’ve got, just be comfortable enough to use them when you need them. Yes? Great - then let’s take a look at some metrics that can Do Good Things for you, your team, and your company.

There are a bizillion different types of metrics out there and I’m going to start with one of the most common – percentage of error. It can be the overall percentage of error, the error in a given environment (usually production), or the percentage of error for a given project/product effort. I’ll start out with the most basic concept and we’ll slice and dice it down into a few commonplace sets of numbers.

Most of the “useful” metrics I’ve seen are pulled to help answer a question or help diagnose a problem. They can also be used as a way to monitor progress; we’ll talk about that later.

When you’re interested in the percentage of error overall or in a given environment, generally it is because you have questions or problems such as:

1. There have been an unusually high number of problems reported by your clients
(whether those are internal or external) and you’re trying to figure out what
happened;

2. People are complaining that the test team is missing too many errors and you’re
not sure what is true or where to focus your efforts;

3. Either you or another group has been working hard on doing (something) better
and you’d like to see if it made a difference;

4. Sometimes the company is happy with the test team’s efforts and sometimes it
isn’t. You’d like to “take a temperature” after each effort to get ahead of the
game and help figure out how to provide more consistent service;

5. You always have issues with the same project team or product line in
production and you’d like to figure out if what you perceive is an issue is, in
fact, an issue. If you can see the same trends or problems numerically, you’d
like to present the information to the PM and executive managers in a way that
will encourage them to take action.

These are just a few examples of the kinds of things that can at least be partially addressed by pulling and analyzing some numbers.

***DANGER…DANGER, WILL ROBINSON….***

Only you or your team (management or other) can decide if a number is “good” or “bad”. Numbers have to be interpreted intelligently. In this case, “percentage of error” is, in and of itself, meaningless. A company’s tolerance for error varies widely between industries. And even corporate culture impacts what is considered “good” or “bad”. It’s for this reason that trying to compare your own company to an “industry standard” can be such a bag of worms. No two companies are identical, so you may be trying to compare yourself to an organization ten times your size, with ten times your budget, etc. Industry standards can be interesting and can even be used to help establish a goal (“something to shoot for”), but the enormous number of types and sheer volume of companies that feed the averages that make up any “standard” means it’s not really a standard FOR YOUR COMPANY. Or ANY OTHER COMPANY. It’s nothing but a ginormous average made up of data from a lot of companies.

For that reason, I believe it is far more valuable to you and to your company to determine and establish your own “standard”.

So let’s start figuring out where you stand, statistically speaking, right now.

There are several ways to determine the percentage of error, either overall or in a given environment.

The first method, which I prefer, requires that you know how many tests (or conditions or scripts or charters - you get the idea) were run and how many errors were found. We’ll start with the most basic number – the percentage of error found across all environments. By the way, if you’re counting all of the errors found throughout the process, including production, you’re going to have to figure out when to stop counting. In our case, I reviewed all of the errors found in production for 4 weeks and found the number of problems reported due to new software installs dropped off dramatically after two weeks. So my cutoff point, at this company, is 2 weeks after new/changed code is migrated to production.

The formula is:

(# errors found)/(#tests run)*100

For example, if 50 errors were found and 200 tests were run, you would have:

50/200*100 = 25%

So your overall error rate is 25%. Is that good or bad? Neither. It’s a number. If your clients call to complain in droves, die, or your boss demands your head on a platter, I’d say it’s “bad”. If you get a board commendation, an invitation to play golf with the CEO, and your coworkers carry you around the building cheering and chanting your name, I’d say that’s “good”.

But you now have a baseline, which through analysis and interpretation, you’ve determined is good or bad for your own environment.

So let’s slice and dice this data further.

Say you’d like to know, of that overall percentage, what was found in the test environment by your testing team and how much was found by your clients in production.

The formula is the same, but you use the number of errors found in each environment.

For example, say 10 errors were found by your test team in the test environment. 40 errors were found by clients in production. The math looks like this:

10/200*100 = 5% (found by your test team)

40/200*100 = 20% (found by your clients)

As a manager, I’d be interested in why our clients were finding more error than my test team. And this is one of the big benefits of taking metrics. This number would raise questions in my mind and I would go ask them. Metrics can be of benefit to point out something unusual and spawn questions that need to be asked. The above situation might not be an issue, or might not be a serious issue. Perhaps your clients are expert and vocal, and the 20% are primarily enhancements they’d like to request. And perhaps it IS a problem with your testing efforts. Regardless, finding answers to questions raised through even the most simple metrics can require a great deal of investigative analysis.

Let’s take these numbers further. Perhaps you’d like to know what the percentage of error is by function. You can do this two ways. If you use the same formula as above, it would be:

(# errors found in X function)/(total # tests run)*100

Notice there is some consistency here. If you’re going to pull metrics, I’d suggest deciding on one way to get the information you need and sticking to it consistently.

There are other ways to determine the error rate of a given function, but the above method tells you the percentage of error for a function when considered for the entire testing effort.

You could also use the following math:

(# errors found in X function)/(total # tests run for X function)*100

This doesn’t tell you how X function fared in comparison with other functions tested during the test effort, but if you’re uninterested in indicators that show you (for example) that you always find more error in function A than function B, then the above formula is fine.

If your questions involve curiosity as to where MOST of your errors were found, environment A, B, or C, you don’t even need to know how many tests were run. You can simply use the number of errors found in this way:

(# errors found in X environment)/(total # of errors across all environments)* 100

Tired of generic match yet? Then let's move on. What can you DO with this stuff? (now, now – be polite).

I can only tell you how the companies for which I’ve worked have used them. When I started working in my current company, one of the problems they wanted to solve was “too much error in production”. The business users were complaining. The testing group was viewed as barely competent. The IT group was viewed as inadequate. Every migration was a debacle, with roll-outs, emergency patches, and the like.

So we took a “percentage of error” baseline. The results? Our error rate was over 49%. The business users found more error in production than we did.

So we dug down deeper, and found out why. I won’t go into the many issues that fed this problem, but we put together a game plan and made a proposal to management, using our error percentages as baselines and making suggestions specifically to address “bringing the error rate down”. I'd like to make the point here that one of the reasons our proposal was accepted and action taken was because our initial numbers were accepted, recognized as less than ideal (boy howdy), and were, as just numbers, a non-offensive way to get a point across. One of the reasons executive management likes numbers is the lack of emotionalism involved.

Our error rate today is between 3-4%. We are respected. The IT organization is respected. We haven’t had to roll back or interrupt service for emergency patches. My test team kicks butt.

None of that happened overnight. It took a lot of change, intelligently implemented, over time. We had the cooperation and support of executive management, development. operations, and architecture.

Metrics made those things possible and gave us a goal, “something to shoot for”, and a way to measure our progress. And it did that in a way that was not confrontational, emotional, or accusatory. Numbers are not emotional, confrontational, or accusatory. They’re just numbers. For every migration, we pulled the same metrics, in the same way, to determine if the changes we’d made seemed to make a difference. So we could see if our numbers were trending up or down. Without something like metrics, we'd just be going on our "feelings" and "opinions". There's nothing wrong with feelings and opinions. But in my experience, it's really tough to get funding based on either one.

At this time, our regression test case base has become big enough that pulling the overall error rate is no longer useful for us. Consider that if you have 100,000 tests and a 10% error rate, that’s 10000 errors. Is 10,000 errors “good”? Probably not. So I’d say that once your regression test case base becomes significant, you’d want to move to pulling metrics on an application or functional basis.

And that, to my mind, is another “rule of metrics”. If they aren’t doing anything useful for you, throw them away. Once you make progress like we did, you don’t need to prove, over and over again, that your error rate is low. I’d focus on an area that needs some improvement or investigation.

And I never get involved in Evil Metrics. From the above numbers, you can extrapolate that it would be easy to determine errors associated with a module or an individual’s work. If asked to do so, I gently refuse. I have refused. Many times. You cannot judge a developer by the number of errors that manifest in their code. They may have double the workload of the rest of the team or the most complex parts of the system to work on. The furthest level of “slicing and dicing” I do is down to a function level; one level above any chunk of code that can be attributed to a single developer.

If you’ve never done this before, why not try it one time? You don’t have to show anyone or use the information, but why not find out where you stand at the moment and get some practice in? Many people want to know what their numbers “mean”, and really, it depends on your company. There are several “industry standards” out there. I can only tell you the Linda Wilkinson industry standard. I’ve handled more than 250 projects, from retail, .com, to banking and aviation. What I’ve found is that if my error rate in production is over 20%, it’s likely that I have something messy on my hands that will cause my end users some pain. If I have less than 10%, the migration is going to “stick”. But those are based on personal experience. All it means is that if I see a number over 20%, I’m going to investigate the whys and the wherefores and see if there’s something we can do to make things better.

And again, I'm well aware that even one error in production, if it's a bad one, can be heinous. But that's NOT WHAT METRICS ARE FOR. Metrics do not really indicate exceptions or individual cases of either goodness or badness. They are merely averages and are useful as overall indicators. If you pull any one single instance of anything, averages do not apply. If you have an extremely small base of tests, like 2, averages will not be of benefit. If you don't have a clue as to how many tests you run, then obviously this set of metrics is useless to you. But that doesn't mean they're useless to everyone or useless in general. They just don't apply to you right now. Someday, they might come in handy, particularly if you need to figure out where you are, establish a baseline, set some goals, or make presentations to executive managers in a way they can understand and accept.

I’m going to talk about defect statistics in my next ‘Metric of the Month”. When I end this series, I’m going to post a copy of our metrics report, and probably wrap up with how to refute metrics that are bogus. Once you know how to run them, you also need to know how to unravel them. If your career follows the same path as mine, you’ll need both skills….

OK, everyone. Target practice is officially open!!!! Pull!!!!!

Monday, August 31, 2009

AH, MY BEAUTIFUL WICKEDNESS….

Do you remember the Wicked Witch in the Wizard of Oz when she’s brought down by a little girl? She laments the loss of her beautiful wickedness….

Well, the Evil Troll is writing a book report, and I have to say that it’s not very evil this time. I’d be afraid I was losing my edge, but fortunately, I’ve read all the recent blogs/articles regarding the differences between “testing” and “checking” and realize all my usual critical synapses are still there. I find the constant attempts to make “the way I think testing should be done is the only Real Way and everyone else sucks rocks” pathetic, to say the least. As far as I can tell, the latest is that anyone who uses specs and doesn’t follow one author’s vague ideas as to what constitutes real testing is a “checker”. The dictionary says testing is “the trial of the quality of something” (the author of the blog would gag), and that checking is “to verify, examine, or investigate”. So it seems to me this all semantics and frankly, it doesn’t help anyone become a better tester, manager, or person and just feeds a bunch of rifts that aren’t necessary. It does not, in my opinion, add anything to the field, unless you want to consider blog fodder a service.

I have, however, just read something that does add something to the field. Don’t want to be associated with “old school” testing, but unwilling to toss your specs, your test cases, or everything else you’ve found works for you away? Like to think about the work you do in some different ways and try a few new techniques?

Enter “Exploratory Software Testing” by James Whittaker.

Dr. Whittaker was kind enough to send me part of this book because of my interest in tours; when I commented back, he asked if he could use the comment along with those of other reviewers for his book and I said yes. Those of you who read this blog know how anal I am about promoting work I don’t support or feeling obligated to say nice things when I want the freedom to be honest. Well, Dr. Whittaker understood that too and knew that comments on a few chapters of the book wouldn't necessarily guarantee I'd like the rest. In all honesty, I genuinely liked this book a lot and it gets a 5 out of 5 on the Wilkinson Scale of Goodness. That’s right; while it hasn’t toppled “Lessons Learned in Software Testing”, it’s right up there with it and it’s the best book I’ve read so far this year. And I read a LOT.

That does not mean I agreed with every single thing that was said. I rarely agree with every single thing someone has to say and I would hope all of you are the same way. I don’t expect others to agree with everything I say either. I’d be bored.

I have to admit right here that I’ve been a James Whittaker fan for years. I’ve heard him speak at least 5 times and I have all of his books. You want to know WHY I like him? First of all, he makes me laugh while he’s proving a point. Secondly, he’s smart and innovative without being impractical. I can DO the stuff he’s talking about. I can apply it to my everyday job. So can my staff. That doesn’t mean I’ve loved everything he’s ever written. I found his first book mighty dry and hard to read, even though the ideas were good. So is it still possible I'm prejudiced in his favor in terms of reviews? Maybe. But it's not really in character.

I guess that all I have to say is wait until you read this one. It starts with general concepts and moves into specific ideas as to how to find specific types of errors. It goes from there into the future of testing. Even the appendices are interesting, amusing, and worth reading. It doesn’t try to get you to throw away your specs or test cases, but gives you ideas as to how to modify what you do now to make your testing better. It has ideas for very structured testers and ideas for very agile testers; concepts that can be applied across the board. And that, overall, is what I liked about this book. No one is trying to “convert” you. The Nice Doctor is trying to help you. And the man loves to help you find bugs. That’s my kinda guy. And you can throw away your dictionaries, old philosophy books, or anything else that makes you want to take a Tylenol. The language is accessible and the concepts clear.

What I’m saying is that the book is sane. What’s more, it’s filled with phrases and situations that will either make you smile, or make you laugh at loud.

The only area in which I disagree and advice I won’t even consider is hiring degreed engineers for test positions. I have no prejudice either for or against degrees or certifications, but I have to say that a piece of paper is no guarantee of intelligence, ability, or technical competency. I understood the “why” of his advice, but won’t be taking it. I would have missed out on some extraordinary talent that way.

It’s refreshing to be able to heartily recommend something and to have invested some time in something that will be of use to me and to my team. My team and I are going to be working on some of the concepts, specifically tours, found in this book and I’ll certainly be sharing some of our successes and failures through my blog. Why will we be doing that? Our regression test case base is getting too big to manage effectively. While the “problem” we had starting out was too many errors in production due to insufficient testing and understanding of existing features, we solved those and our “problems” now are the time and expense required for regression testing and how to pass on our domain/testing knowledge both amongst ourselves and to new staff. In other words, we think we’re good, but we think we can be even better. As many of you know, I support metrics and we’ll be able to tell whether we’re improving or degrading our service through those metrics.

It’s US $39.95, folks, and I think it’s worth the time and money. Give it a read and let me know what you think.

As a final note, I rarely talk about my guilty pleasures and some of my favorite reads, and I don’t like to maintain blog lists, since I feel obligated to support anyone I advertise for (it affects what I can say and how) and to make sure they blog on a regular basis and I’m not recommending some link with nothing new since 2006. Since my general mood today is Totally Mellow after reading something I really liked, I’d like to at least mention my own “regular” reading list. I’ve already said I like James Whittaker. I read QAHATESYOU regularly (that site epitomizes the meaning of the word “tester”, including the sardonic humor). I read BJ Rollison, Alan Page, and James Bach. Yup, you read that correctly. One of my favorite writers of all time is Cem Kaner, although he doesn’t write as much for general consumption as I wish he would. I like Erik Peterson quite a bit. Scott Barber and Anne-Marie Charrett are on my list. I like Corey Goldberg. I belong to the “School of Joe” (Joe Strazzere) And if I were going to pick one person in the field right now that I’m going to keep an eye on (I think he’s going to be great), it would be Rob Lambert. I like the fact he feels free to comment on whatever he thinks and I like the way he makes those comments. He’s manages to be a bit more PC (politically correct) than most without being timid; an unusual quality.

If you blog regularly, chances are I’ve read or do read your work. These are just favorites off the top of my head; part of the reason I hate to post these kinds of things is that I have a lot of friends in this field and no desire to insult anyone. Considering the nature of some of my posts, that might surprise you. But the reason I feel free to say what I think is that I'm so reluctant to align myself with any one person or group. My list has nothing to do with whether you’re “good”. It has to do with what I normally check out regularly and what I remembered in five minutes.

By the way, lest you think I truly have lost all my Beautiful Wickedness, I’m through with the love-fest for the day. I’m about done with my first “Metric of the Month”, so I doubt I’ll be feeling especially mellow afterwards. Plucking arrows out of my back (or forehead) is usually not conducive to “mellow”. And I have two less…um…sweet and kind blog posts waiting for just the right moment to publish. I'm really fond of one of them; it's a fairy tale.

Hope you read “Exploratory Software Testing” and check out some of my favorite bloggers (some of whom will want an adult beverage when they find out they’re on the list – and not to celebrate!); love ‘em or hate ‘em; MADE YOU LOOK!!! Enjoy…

Wednesday, August 12, 2009

METRIC OF THE MONTH

Whereby I become a skeet….

For those of you who don’t know what a “skeet” is, it’s a clay pigeon flung into the air used as target practice.

‘Metrics” is a dirty word in Testing Land, but as a manager, I believe every competent senior resource out there should be familiar with and able to use metrics for the benefit of their team, themselves, and the company for which they work. Sticking your head in the sand like a flinking ostrich and saying ‘I won’t, I won’t!” merely ensures someone smarter than you are politically will crush you like a grape.

Big Boys and Girls in the business world read and use metrics in a variety of strategic ways and if you need to survive and thrive in the business world, you need to be able to do the same. Again, it is just an immature resource that refuses to do what they need to do to be successful and then complains when they don’t get any respect.

Some of the strategies supported by metrics aren’t very honest. But that’s a personal choice and it doesn’t have to be yours. Metrics are just numbers and nothing more. Without interpretation, they mean absolutely nothing. Used properly, metrics can be useful, interesting, surprising, and illuminating. It depends on what you collect, how, and why.

What I’m going to offer in “Metric of the Month” are those common metrics that I’ve used personally at various companies and that have proven useful in some way.

I’ll tell you how they were captured and used, what made them useful, and under what circumstances they aren’t useful at all. From there, you can make up your own minds as to what might work for you.

This first post is just to lay some ground rules and talk about metrics in general.

First, any type of statistic assumes you have something to count. If you don’t know how many tests you run, you have nothing to count. If you don’t know how many defects you write or note severities/priorities of those bugs, you won’t have anything to count. So if you work in an environment where none of those things matter, nothing in regards to metrics is going to be especially useful to you right now. But I think it’s still a good idea to understand and be able to use metrics just in case you aren’t always working where you’re working now and they become more important to you at some future date.

If you have test ideas, conditions, scripts, cases, or charters written down somewhere, you can collect statistics. Now I’m aware of the arguments that during a given test a human being may make thousands of observations, any one of which may result in a defect report. But one can argue for weeks regarding whether that is just part of the process of running that test, or if every one of those observations should be documented (which would be a daunting task). For myself, I keep things pretty simple. I have a test condition, which is a something I specifically want to examine. I might find something else as I work with that condition, which is an observation and linked to my test. You might feel more comfortable trying to document actions and system reaction. Well, we write test cases as well, but we do it after the testing effort as part of our maintenance process. Regardless, both of us have (some) kind of test idea in (some) kind of format, and that entity (or “test artifact”) can be counted.

Metrics are based on averages. There are times when averages are useful and times when averages are positively stupid. You, as a thinking entity, need to determine what is or is not useful information. One general rule of thumb is that the larger the effort, the more accurate and meaningful metrics become.

Consider, for example, a small, fiendishly complex change. You may only have two tests for it, and you might find one minor error. That equates to a 50% error rate. But what does that number really tell you? Nothing. And if you blend that 50% error rate into another error rate for a total error rate, the 50% might drag your numbers into the toilet. In other words, it might end up painting an inaccurate picture. If, however, you have 1000 tests for that change and you find 500 errors, that 50% error rate paints a more accurate picture of how things stand overall.

Another general rule of thumb is that metrics are useless for determining individual worth (or lack thereof). People are smart enough to skew numbers to their own advantage, and you’ll end up encouraging Very Bad Behaviors if you try it.

Consider a testing staff “judged” by the number of defects they write. Oh, you’ll get LOTS of defects written. Good luck in getting all of the bogus, duplicate, and picayune enhancement defects out of your logs, however. Or dealing with the Major Drama that ensues when something good is submitted for test and even your most talented tester simply can’t find much wrong with it. Oddly enough, if you go out RIGHT NOW and take a look at your defect logs, counting who wrote how many bugs, you’ll probably find the numbers line up pretty much in the same order as you suspect in terms of testing talent on your team. Good testers DO inevitably find more bugs. But all of that will change if you start using metrics to “judge” the merits of your team. Some of the worst will end up looking like your best, and some of your best (if they even stick around) will look bad in comparison.

The same types of things happen when numbers are used “against” development staff and they are evaluated according to how many bugs manifest in their code. First of all, you might give your best developer the most complex code, which by its very nature will manifest more errors. And you might give them 4 times the amount of work as a less experienced resource. Evaluating development staff according to the number of errors found in their code is a huge mistake; no one will ever want to work on anything even mildly complex, and you’ll get kickback on every single error that is documented. The development team will hate you. YOU will hate you. Even your dog will hate you, just on general principles.

So I’d recommend avoiding what I call “Evil Metrics” and do everything possible to avoid quantifying individual goodness with numbers. No matter how much you might hate dealing with “touchy-feely” stuff, people are not “data”.

I’m going to start this series with the type of metrics we customarily pull after a major code migration and work down into layers of complexity. This month, it’s going to be the percentage of error in production; look for the first blog in the series during the next week or so. My Kevlar vest hasn’t arrived yet and the target I’ve painted on my back isn’t dry yet…