Test effectiveness

Boris Biezer stated black box testing was approximately 35 to 65% effective. I had also read that Gerald Weinberg conducted studies at IBM with similar results. I recently spoke at the SQS conference in London and in the opening presentation Bob Barlett stated that SQS studies indicated that formal test design was almost twice as effective in defect detection per test case as compared to expert (exploratory) type testing, and of course put into perspective the infamous "death by checklist" syndrome.

 

About 4 years ago I began a 3 year study at Microsoft to verify assertions on testing effectiveness from a black box approach. I used Weinberg’s famous Triangle paradigm for the assessment. Given a brief functional requirement participants in the case study were asked to define tests to validate a program written in C# against the stated requirements. The basic requirements are outlined in Glendford Myer’s book The Art of Software Testing as “A program reads three (3) integer values. The three values are interpreted as representing the lengths of the sides of a triangle. The program displays a message that states whether the triangle is scalene, isosceles, or equilateral.”

Based on the implementation in C# (pseudo code below) and assuming that all inputs are valid integer values we determined the minimum number of tests to validate a program against this functional requirement is 11 tests as outlined below.

if (a + b <= c) or (b + c <= a) or (a + c <= b)
then invalid triangle
else if (a equals b) and (b equals c)
then equilateral triangle
else if (a not equal b) and (b not equal c) and (a not equal c)
then scalene triangle
else isosceles triangle

The minimum tests for conditional control flow and data flow (again assuming valid integer inputs)

· 6 tests to validate the invalid triangle path
a + b < c
a + b = c
etc.

· 1 test for the equilateral path

· 1 tests for scalene

· 3 tests for isosceles (which actually verify the false outcomes in the sub-expressions of the scalene predicate statement)

I collected data for 3 years with more than 500 participants ranging from < 6 months to more than 5 years testing experience but non having formal training in testing techniques or methodologies. Interestingly enough, the data changed very little from the first few groups. The empirical results of this case study demonstrate the average effectiveness of tests in the most critical area of the program was only 36%. This literally means that of the minimum 11 tests for control and data flow coverage this section of code the average tester defined only 4 tests (1 test for invalid, 1 for equilateral, 1 for scalene, and 1 for isosceles). During this time period Microsoft was also making a transition to hire testers with greater technical competence and coding skills. Perhaps not surprising to most, the testers with a coding background increased the test effectiveness ratio by 50%.

This is just a small snap shot of the overall case study, but the overall conclusions determined that untrained testers using only an exploratory black box approach to testing are less effective and non-technical testers are 50% more likely to perform redundant or ineffective tests as compared to testers with greater technical competence (not necessarily coding skills, but a greater understanding of the entire system under test.)

Some managers at Microsoft scoffed at these results. One said that if he asked one of his non-technical testers to test the design of a new coffee cup that person would probably do better as compared to someone with a computer science background. OK…he probably has a point. But, I would argue that Microsoft and many other software companies are in the business of producing technological solutions to customers, and not in the business of making mugs (unless perhaps Microsoft’s ceramic team is in building 7 and that is a new LOB I don’t know about). The bottom line is that formal training in established, time proven formal functional and structural techniques can increase effectiveness of testers and reduce potential risk in a software project.

Comments

  • Anonymous
    October 15, 2006
    It would be interesting to see the case study as a whole as there is the danger to misinterpret the above information. Does the low rate of 36% of test effectiveness cause concern?  Perhaps putting testing into the hands of testers who are not trained properly is a great risk factor? and concern to the testing community?  I would personally be concerned if I was working with testers who could not identify basic tests. And what is the definition of trained and untrained testers?  I've been testing for years, very keen and self educated, am not a coder to any extent, but believe I have a good understand of software.  Am I a trained or untrained tester? You say that Microsoft provides technology solutions and are not in the business of making coffee mugs, which to me does not ring right.  Making coffee mugs to me is the equivalent of Microsoft (or any company) designing software.  I think it's dangerous to measure tester's effectiveness on the basis of number of tests identified.  I'm sure most of us testers have experience of finding issues that were not covered in tests. Afterall, if something doesn't look good or work as expected it won't succeed - it  doesn't matter whether it's functionally correct. It's all about balance...

  • Anonymous
    October 15, 2006
    The comment has been removed

  • Anonymous
    October 18, 2006
    The comment has been removed

  • Anonymous
    October 19, 2006
    The comment has been removed

  • Anonymous
    October 31, 2006
    The comment has been removed

  • Anonymous
    March 06, 2007
    I think the most interesting thing about this test is that, by definition, every equilateral triangle is also an isosceles triangle.

  • Anonymous
    March 06, 2007
    The comment has been removed

  • Anonymous
    March 08, 2007
    Of course.  But I think it's a mistake for the specifications to imply that 'equilateral' and 'isosceles' are mutually-exclusive. In the general case, I think it's an interesting topic for testing:  Is it even possible for the specifications to be wrong, in the context of software testing?

  • Anonymous
    March 08, 2007
    Yes, it is absolutely possible for the specification to be wrong; not just in the context of software testing, but also in the design in development of the software. In this particular example of the 'triangle' problem (which is well over 20 years old) the requirements state we specifically want to identify 3 different types of triangles ( equilateral, scalene, and isosceles). You may 'think' the requirements are wrong (everyone is entitled to their opinion), but it is what it is. We could display a message that appears when users input 3 equal values read "These values equate to an equilateral triangle which is a special type of isosceles triangle." Perhaps that would satisfy those who really want to over-analyze this simple parable. Personally I think you are putting way too much energy into trying to draw me into a philosophical debate on the differences between an equilateral triangle and an isosceles triangle.

  • Anonymous
    March 08, 2007
    Heh; sorry about that.  I don't mean to belabour this issue.