December 2009

Volume 24 Number 12

Test Run - Pairwise Testing with QICT

By James McCaffrey | December 2009

A solid knowledge of pairwise testing principles is essential for all software testers, developers and managers. In this month’s column, I explain exactly what pairwise testing is and provide you with complete C# source code for a production-quality pairwise testing tool named QICT. In short, pairwise testing is a technique that allows you to reduce a large, unmanageable set of test-case inputs to a much smaller set that is likely to reveal bugs in the system under test. The best way to explain pairwise testing, and to show you where I’m headed in this article, is by way of two screenshots. Consider the dummy Windows Form-based application shown in Figure 1. The application has four input parameters. The first parameter is a TextBox control that can accept “a” or “b”. The second parameter is a group of RadioButton controls that can take a value of “c”, “d”, “e” or “f”. The third parameter is a ComboBox control that can take a value of “g”, “h” or “i”. The fourth parameter is a CheckBox control that takes a value of either “j” or “k”. So one test-case input set would be { “a”, “c”, “g”, “j” }. The dummy application has a total of 2 * 4 * 3 * 2 = 48 possible input sets, which is certainly manageable. But imagine a card-playing application of some sort with five parameters, where each parameter can take on one of 52 values (to represent a card from a normal deck of playing cards, with replacement). In this situation there would be 52 * 52 * 52 * 52 * 52 = 380,204,032 possible input sets, which is likely to be unmanageable unless you could programmatically generate expected values for each test set input.

A Dummy Application with Four Input Parameters

Figure 1 A Dummy Application with Four Input Parameters

The idea of pairwise testing is to generate a list of test sets that capture all possible pairs of parameter values from each parameter. For the example shown in Figure 1, there are a total of 44 such input pairs:

(a,c), (a,d), (a,e), (a,f), (a,g), (a,h), (a,i), (a,j), (a,k), (b,c), (b,d), (b,e), (b,f), (b,g), (b,h), (b,i), (b,j), (b,k), (c,g), (c,h), (c,i), (c,j), (c,k), (d,g), (d,h), (d,i), (d,j), (d,k), (e,g), (e,h), (e,i), (e,j), (e,k), (f,g), (f,h), (f,i), (f,j), (f,k), (g,j), (g,k), (h,j), (h,k), (i,j), (i,k)

Now the test set { “a”, “c”, “g”, “j” } captures six of the 44 pairs: (a,c), (a,g), (a,j), (c,g), (c,j) and (g,j). So the goal of pairwise test set generation is to produce a collection of test sets that capture all 44 pairs. Take a look at the screenshot in Figure 2.

Pairwise Test Set Generation with the QICT Tool

Figure 2 Pairwise Test Set Generation with the QICT Tool

The screenshot shows a tool named qict.exe generating a collection of 12 test sets that capture all 44 input pairs for the scenario shown in Figure 1. If you trace through each pair of values in the 12 test sets generated in Figure 2, you’ll see that they do in fact capture all 44 pairs listed above. So in this situation, we have reduced our possible test-case inputs from 48 test cases to 12 test cases. The savings aren’t very significant for this small example, but as I’ll show in a moment, using pairwise testing can dramatically reduce the number of test-case inputs in many situations. The underlying assumption of pairwise testing is that software bugs are more frequently found in code that involves the interaction of values from different parameters than in code that involves values from within a particular parameter. In other words, for the dummy application in Figure 1, application code that deals with inputs “a” and “g” is more likely to introduce a logic error than code that deals with inputs “a” and “b”. This is a notion that is, in fact, supported by some research.

Using the PICT Tool

There are several pairwise test set generation tools available to you. My favorite tool in most situations is the PICT (Pairwise Independent Combinatorial Testing) tool. PICT was written by my colleague Jacek Czerwonka, who adapted code from an existing internal-Microsoft pairwise tool. PICT is available as a free download from several locations, including the Microsoft Tester Center page at msdn.microsoft.com/testing/bb980925.aspx. If you search the Internet, you will also find several other pairwise test set generation tools. However, PICT is a single executable that runs from a shell command line. PICT is very fast, very powerful and should meet your pairwise testing needs in most situations. I named the tool presented in this article QICT (which doesn’t stand for anything in particular) to acknowledge the importance of the PICT tool.

So, why yet another pairwise test set generator? There are several reasons.  First, although PICT is a wonderful tool, it is written in native C++ code and the source code is not available. The QICT tool presented here is, as far as I can tell, the first production-quality pairwise tool written with managed C# code. The availability of the code allows you to freely modify QICT to meet your own needs. For example, you can modify QICT to directly read its input from an XML file or a SQL database, or you can modify QICT to directly emit results in a custom output format. And you may want to experiment with the tool’s logic, say, for example, by introducing constraints (test input sets that are not permitted), by introducing required test sets, or changing how the tool generates its test set collection. Additionally, the availability of QICT source code allows you to copy and place pairwise test set generation code directly into a .NET application or test tool. Finally, although source code for a few pairwise test set generation tools is available on the Internet, some of these tools are quite inefficient. For example, consider a situation with 20 parameters, each of which has 10 values. For this scenario there are 10 * 10 * 10 * . . . * 10 (20 times) = 10^20 = 100,000,000,000,000,000,000 possible test-case inputs. This is a lot of test cases. The PICT tool reduces this to only 217 pairwise test sets, and the QICT tool produces either 219 or 216 test sets (depending upon the seed value of a random number generator, as I’ll explain shortly). However, one widely referenced pairwise test set generation tool written in Perl produces 664 sets. Finally, with the QICT source code available and this article’s explanation of the algorithms used, you can recast QICT to other languages, such as Perl, Python, Java or JavaScript if you wish.

The QICT Tool

The code for the QICT tool is slightly too long to present in its entirety in this column, but the entire source code is available from the MSDN Code Gallery at code.msdn.microsoft.com. I will describe the algorithms and data structures I use, along with snippets of key code, so you’ll have enough information to use and modify QICT as needed. The essence of how QICT works is to generate one test set at a time, using greedy algorithms to place each parameter value, until all possible pairs have been captured. The high-level algorithm for QICT is presented in Figure 3.

Figure 3 QICT Algorithm

read input file
create internal data structures

create an empty testset collection
while (number of unused pairs > 0)
  for i := 1 to candidate poolSize
    create an empty candidate testset
    pick the "best" unused pair
    place best pair values into testset
    foreach remaining parameter position
      pick a "best" parameter value
      place the best value into testset
    end foreach
  end for
  determine "best" candidate testset
  add best testset to testset collection
  update unused pairs list
end while

display testset collection

The key to implementing this high-level algorithm is determining what kind of data structures to use and what the various “best” options are. The QICT source code begins like this:

static void Main(string[] args)
{
  string file = args[0];
  Random r = new Random(2);
  int numberParameters = 0;
  int numberParameterValues = 0;
  int numberPairs = 0;
  int poolSize = 20;

I coded QICT using a traditional procedural style rather than taking an object-oriented approach so you can more easily refactor QICT to languages with limited OOP support, such as Perl and JavaScript. I first read an input file from the command line. As you can see, in order to keep my code clean and simple, I have left out normal error-checking you’d want to include. The input file for QICT is the same as that used by PICT, a simple text file that looks like:

Param0: a, b
Param1: c, d, e, f
etc.

Parameter names are followed by a colon character and a comma-delimited list of legal values for that parameter. Parameter values must be distinct. Next, I instantiate a Random object. The choice of a seed value of 2 is arbitrary, but any value will make QICT produce the same results for an input set every time it is run. I’ll explain the purpose of the pseudo-random number object shortly. I declare three int variables that will be assigned values when the input file is read. For the example shown in Figure 2, numberParameters is 4, numberParameterValues is 11 and numberPairs is 44. The poolSize variable stores the number of candidate test sets to generate for each test set. If you experiment with QICT a bit, you’ll see that the tool is impacted in a rather surprisingly minor way by adjusting the value for poolSize. The heart of QICT is the declaration of the main data structures. The first four objects are:

 

int[][] legalValues = null;
string[] parameterValues = null;
int[,] allPairsDisplay = null;
List<int[]> unusedPairs = null;

The legalValues object is a jagged array where each cell in turn holds an array of int values. The legalValues array holds an in-memory representation of the input file, so cell 0 of legal values holds an array that in turn holds the values 0 (to represent parameter value “a”) and 1 (to represent “b”). It turns out that working directly with string values is rather inefficient and that representing parameter values as integers yields significantly faster performance. The parameterValues string array holds the actual parameter values and is used at the end of QICT to display results as strings rather than ints. So, for the preceding example, cell 0 holds “a”, cell 1 holds “b” and so on through cell 10, which holds “k”. The allPairsDisplay object is a two-dimensional array of ints. It is populated by all possible pairs. For our example, cell [0,0] holds 0 (for “a”) and cell [0,1] holds 2 (for “c”)—the first possible pair. Cell [1,0] holds 0 and cell [1,1] holds 3 to represent the second pair, (a,d). The unusedPairs object is a generic List of int arrays. The first item in unusedPairs is initially {0,2}. I use a List collection for unusedPairs rather than an array because each time a new test set is added to the test sets collection, I remove the pairs generated by the new test set from unusedPairs. Additionally, this means I have a convenient stopping condition that will occur when unusedPairs.Count reaches 0.

The next four main program data structures are:

int[,] unusedPairsSearch = null;
int[] parameterPositions = null;
int[] unusedCounts = null;
List<int[]> testSets = null;

Most pairwise test set generation tools, including QICT, perform a huge number of searches. An efficient lookup approach is crucial for reasonable performance. Here I declare a two-dimensional array named unusedPairsSearch. It is a square array with size numberParameterValues by numberParameterValues, where each cell holds a 1 if the corresponding pair has not been used, and a 0 if the corresponding pair has been used or is not a valid pair. Initially, the first three rows of unusedPairsSearch for the example in Figure 2 are:

0 0 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1

And so forth.

So row one means pairs (0,0) and (0,1)—that is, (a,a) and (a,b)—are not valid, while pairs (0,2), (0,3), . . . (0,10)—that is (a,c), (a,d) through (a,k)—have not yet been captured by a test set. The parameterPositions array holds the location within a test set of a specified parameter value. After initialization this array holds values:

0 0 1 1 1 1 2 2 2 3 3

The index of parameterPositions represents a parameter value, and the corresponding cell value represents its position in a test set. So the fourth cell from the left has index = 3 and value = 1, meaning parameter value 3 (“d”) belongs at position 1 (the second slot) in a test set. The unusedCounts object is a one-dimensional array that holds the number of times a particular parameter value appears in the unusedPairs array. Initially unusedCounts holds:

9 9 7 7 7 7 8 8 8 9 9

The index represents a parameter value, and the corresponding cell value is the unused count. So, the fourth cell from the left has index = 3 and value = 7, meaning parameter value 3 (“d”) initially appears in 7 unused pairs— (a,d), (b,d), (d,g), (d,h), (d,i), (d,j) and (d,k). The testSets object holds the pairwise test set results. It is initially empty but grows every time a new test set is generated. Each test set is represented by an int array. So, in Figure 2, the first test set in the result is {“a”, “c”, “g”, “j” }, which is stored in the testSets List as an array with values {0,2,6,9}.

With the key data structures in place, QICT reads the input file to determine values for numberParameters and numberParameterValues, and to populate the legalValues and parameterValues arrays. I use the relatively crude approach of performing an initial read of the file, and then resetting the file pointer and performing a second pass through the file. Once legalValues is populated, I can scan through it to determine the number of pairs for the input:

for (int i = 0; i <= legalValues.Length - 2; ++i) {
  for (int j = i + 1; j <= legalValues.Length - 1; ++j) {
    numberPairs += (legalValues[i].Length * legalValues[j].Length);
  }
}
Console.WriteLine("\nThere are " + numberPairs + " pairs ");

After initialization, the first row of legalValues holds {0,1} and the second row holds {2,3,4,5}. Notice that the pairs determined by these two rows are (0,2), (0,3), (0,4), (0,5), (1,2), (1,3), (1,4), and (1,5), and that in general the number of pairs determined by any two rows in legalValues is the product of the number of values in the two rows, which equals the row Length property of the rows. The next part of QICT code populates the unusedPairs List:

unusedPairs = new List<int[]>();
for (int i = 0; i <= legalValues.Length - 2; ++i) {
  for (int j = i + 1; j <= legalValues.Length - 1; ++j) {
    int[] firstRow = legalValues[i];
    int[] secondRow = legalValues[j];
    for (int x = 0; x < firstRow.Length; ++x) {
      for (int y = 0; y < secondRow.Length; ++y) {
        int[] aPair = new int[2];
        aPair[0] = firstRow[x];
        aPair[1] = secondRow[y];
        unusedPairs.Add(aPair);
      }
    }
  }
}

Here I grab each pair of rows from legalValues using indexes i and j. Next, I walk through the values in each row pair using indexes x and y. Extensive use of multiple nested for loops like this is a hallmark of combinatorial code. When I write such code, I always draw by hand on a piece of paper the arrays involved because it’s quite easy to make mistakes without a diagram. After populating the unusedPairs List, I use the same nested loop structure to populate the allPairsDisplay and unusedPairsSearch arrays. The initialization code next populates the parameterPositions array by iterating through legalValues:

parameterPositions = new int[numberParameterValues];
int k = 0;
for (int i = 0; i < legalValues.Length; ++i) {
  int[] curr = legalValues[i];
  for (int j = 0; j < curr.Length; ++j) {
    parameterPositions[k++] = i;
  }
}

The initialization code concludes by populating the unusedCounts array:

unusedCounts = new int[numberParameterValues];
for (int i = 0; i < allPairsDisplay.GetLength(0); ++i) {
  ++unusedCounts[allPairsDisplay[i, 0]];
  ++unusedCounts[allPairsDisplay[i, 1]];
}

Here, as in many of the QICT routines, I take advantage of the fact that C# automatically initializes all cells in int arrays to 0. If you wish to recast QICT to an object-oriented style, all these initialization routines would likely best be placed either in an object constructor or perhaps in an explicit Initialize() method. The main processing loop begins:

testSets = new List<int[]>();
while (unusedPairs.Count > 0) {
  int[][] candidateSets = new int[poolSize][];
  for (int candidate = 0; candidate < poolSize; ++candidate) {
    int[] testSet = new int[numberParameters];
  // fill candidate testSets
  }
  // copy best testSet into testSets collection; upate data structues
}

Because the number of candidate test sets is known to be poolSize, 
I can instantiate an array rather than use a dynamic-sized List object. Notice that the size of the unusedPairs collection controls the main processing loop exit. Now it’s time to pick the “best” unused pair—and now things start to become really interesting:

int bestWeight = 0;
int indexOfBestPair = 0;
for (int i = 0; i < unusedPairs.Count; ++i) {
  int[] curr = unusedPairs[i];
  int weight = unusedCounts[curr[0]] + unusedCounts[curr[1]];
  if (weight > bestWeight) {
    bestWeight = weight;
    indexOfBestPair = i;
  }
}

Here I define best to mean the unused pair that has the highest sum of unused individual parameter values. For example, if “a” appears one time in the current list of unused pairs, “b” appears two times, “c” three times and “d” four times, then pair (a,c) has weight 1 + 3 = 4, and pair (b,d) has weight (b,d) 2 + 4 = 6, so pair (b,d) would be selected over (a,c).

There are many other weighting schemes you might wish to explore. For example, using some sort of multiplication would give higher weights to pairs with extreme values of unused counts compared with pairs that have unused counts closer together. Another possibility is to track used counts—the number of times parameter values appear in the test sets already added to the result testSets collection—and pick as the best pair the one that has the least used counts. Once the best unused pair has been determined, I create a two-cell array to hold the pair values and determine the positions within a test set where each value belongs:

int[] best = new int[2];
unusedPairs[indexOfBestPair].CopyTo(best, 0);
int firstPos = parameterPositions[best[0]];
int secondPos = parameterPositions[best[1]];

At this point I have an empty test set and a pair of values to place in the test set, and I know the location within the test set where the values belong. The next step is to generate parameter values for the remaining positions in the test set. Now, rather than fill the test set positions in some fixed order (from low index to high), it turns out that it is much better to fill the test set in random order. First, I generate an array that holds the parameter positions in sequential order:

int[] ordering = new int[numberParameters];
for (int i = 0; i < numberParameters; ++i) 
  ordering[i] = i;

Next, I rearrange the order by placing the known locations of the first two values from the best pair into the first two cells of the ordering array:

ordering[0] = firstPos;
ordering[firstPos] = 0;
int t = ordering[1];
ordering[1] = secondPos;
ordering[secondPos] = t;

And now I shuffle the remaining positions (from cell 2 and up) using the Knuth shuffle algorithm. This is why I created a Random object at the beginning of the QICT code. The number of test sets produced by QICT is surprisingly sensitive to the value of the pseudo-random number generator seed value, so you may want to experiment with several seed values. For the situation with 20, 10-value parameters I described earlier, using a seed value of 2 generates 219 test sets, and a seed value of 6 generates 216 test sets, but a seed value of 0 yields 221 test sets.

for (int i = 2; i < ordering.Length; i++) {
  int j = r.Next(i, ordering.Length);
  int temp = ordering[j];
  ordering[j] = ordering[i];
  ordering[i] = temp;
}

After shuffling, I place the two values from the best pair into the candidate test set:

testSet[firstPos] = best[0];
testSet[secondPos] = best[1];

Now comes the most important part of the QICT algorithm. I must determine the best parameter values to place in each of the empty test set positions. The technique I use is another greedy approach. For each parameter position, I test each possible legal value at that position, by counting how many unused pairs in the test value, when combined with the other values already in the test set capture. Then I select the parameter value that captures the most unused pairs. The code to do this is the trickiest part of QICT and is listed in Figure 4.

Figure 4 Filling Test Set with Best Parameter Values

for (int i = 2; i < numberParameters; ++i) {
  int currPos = ordering[i];
  int[] possibleValues = legalValues[currPos];
  int currentCount = 0;
  int highestCount = 0;
  int bestJ = 0; 
  for (int j = 0; j < possibleValues.Length; ++j) {
    currentCount = 0;
    for (int p = 0; p < i; ++p) {
      int[] candidatePair = new int[] { possibleValues[j],
        testSet[ordering[p]] };
      if (unusedPairsSearch[candidatePair[0], candidatePair[1]] == 1 ||
        unusedPairsSearch[candidatePair[1], candidatePair[0]] == 1)
        ++currentCount;
    }
    if (currentCount > highestCount) {
      highestCount = currentCount;
      bestJ = j;
    }
  }
  testSet[currPos] = possibleValues[bestJ];
}

The outermost loop in Figure 4 is a count of the total number of test set positions (given by numberParameters), less two (because two spots are used by the best pair). Inside that loop I fetch the position of the current spot to fill by looking into the ordering array I created earlier. The currentCount variable holds the number of unused pairs captured by the test parameter value. Notice that because I am filling test set positions in random order, the candidate pair of values can be out of order, so I need to check two possibilities when I do a lookup into the unusedPairsSearch array. At the end of the code in Figure 4, I will have a candidate test set that has values in every position that were selected using greedy algorithms. Now I simply add this candidate test set into the collection of candidates:

candidateSets[candidate] = testSet;

At this point I have n = poolSize candidate test sets and I need to select the best of these to add into the primary testSet result collection. I could assume that the first candidate test set captures the most unused pairs and simply iterate through each candidate starting at position 0, but again, introducing some randomness produces better results. I pick a random spot within the candidates and assume it is the best candidate:

int indexOfBestCandidate = r.Next(candidateSets.Length);
int mostPairsCaptured =
  NumberPairsCaptured(candidateSets[indexOfBestCandidate],
    unusedPairsSearch);

Here I use a little helper function named NumberPairsCaptured() to determine how many unused pairs are captured by a given test set. The helper function is:

static int NumberPairsCaptured(int[] ts, int[,] unusedPairsSearch) 
{
  int ans = 0;
  for (int i = 0; i <= ts.Length - 2; ++i) {
    for (int j = i + 1; j <= ts.Length - 1; ++j) {
      if (unusedPairsSearch[ts[i], ts[j]] == 1)
        ++ans;
    }
  }
  return ans;
}

Now I walk through each candidate test set, keeping track of the location of the one that captures the most unused pairs:

for (int i = 0; i < candidateSets.Length; ++i) {
  int pairsCaptured = NumberPairsCaptured(candidateSets[i],
    unusedPairsSearch);
  if (pairsCaptured > mostPairsCaptured) {
    mostPairsCaptured = pairsCaptured;
    indexOfBestCandidate = i;
  }
}

And now I copy the best candidate test set into the main result testSets List object:

int[] bestTestSet = new int[numberParameters];
candidateSets[indexOfBestCandidate].CopyTo(bestTestSet, 0);
testSets.Add(bestTestSet);

At this point, I have generated and added a new test set, so I must update all the data structures that are affected, namely, the unusedPairs List (by removing all pairs that are generated by the new test set), the unusedCounts array (by decrementing the count for each parameter value in the new test set), and the unusedPairsSearch matrix (by flipping the values associated with each pair generated by the new test set from 1 to 0).

Now I’m at the end of my main processing loop. I continue generating candidates, selecting the best candidate, adding the best candidate to testSets and updating data structures operations. The processing will end when the number of unused pairs reaches 0.

Then I display the final results:

Console.WriteLine("\nResult testsets: \n");
for (int i = 0; i < testSets.Count; ++i) {
  Console.Write(i.ToString().PadLeft(3) + ": ");
  int[] curr = testSets[i];
  for (int j = 0; j < numberParameters; ++j) {
    Console.Write(parameterValues[curr[j]] + " ");
  }
  Console.WriteLine("");
}
  Console.WriteLine("");
}

As I mentioned earlier, if you are modifying QICT to suit your own particular testing scenario, you may want to emit results directly to an XML file, a SQL database or some other form of storage.

Produce Better Systems

Pairwise testing is a combinatorial technique with probabilistic factors. Pairwise test set generation is an important technique, but it isn’t magic. Remember that pairwise techniques simply reduce the number of test-case inputs in situations where you just have too many test cases to deal with. Pairwise test set generation does not create test-case expected results. You should always begin by using normal testing principles, such as looking at boundary conditions, using pure random input and so on, and then use pairwise testing to supplement your test-case generation. Additionally, as a general rule of thumb, more testing is better, so there’s no reason why you can’t add additional test-case inputs to those produced by pairwise generation tools. Although pairwise testing is useful in many situations, be sure to use it only when appropriate.

I have found pairwise test set generation to be very useful for configuration testing, for module testing methods that accept enumerated values and for testing SQL databases where each column in a table has a relatively small number of different values. Pairwise testing is not necessarily a good approach for scenarios where you have a relatively small number of test-case inputs, or when you can programmatically produce test-case-expected results (and therefore deal with a large test-case input set). And pairwise testing is not normally usable when the input values to the system under test are not discrete. However, even in situations where the number of possible parameter values is very large, you may be able to effectively use pairwise test-case input generation by separating parameter values into equivalence classes. When used properly, pairwise test set generation is an important technique that can help you produce better software systems.


Dr. James McCaffrey works for Volt Information Sciences Inc., where he manages technical training for software engineers based at Microsoft’s Redmond, Wash., campus. He has worked on several Microsoft products, including Internet Explorer and MSN Search, and is the author of  .NET Test Automation Recipes: A Problem-Solution Approach (Apress, 2006). James can be reached at jmccaffrey@volt.com or v-jammc@microsoft.com.

Thanks to the following technical expert for reviewing this article: Jacek Czerwonka

Send your questions and comments for James to testrun@microsoft.com.