May 2014

Volume 29 Number 5

# Test Run : Association Rule Learning

James McCaffrey

Association rule learning is a technique from the field of machine learning that extracts if-then rules from a set of data. For example, if the data being explored is a set of supermarket transactions, one association rule might be, “IF a customer buys apples and coffee THEN there is a high likelihood he will also buy butter and donuts.”

There are several types of association rules. This article explains how to extract high-confidence rules, which are characterized by being true in a specified minimum percentage of the transactions being analyzed. Association rule learning can be applied to many kinds of data besides purchasing transactions, including system log files, user search queries and natural UI commands. This article explains how high-confidence association rule learning works and presents a complete demo program.

The best way to see where this article is headed is to take a look at the demo program in Figure 1. The demo sets up 10 dummy transactions, encoded so each item is a 0-based value. For example, the first transaction is (0 3 4 11), which means that items 0, 3, 4 and 11 occur together. In most cases, duplicate transactions are allowed, such as transactions 2 and 3.

Figure 1 Finding High-Confidence Association Rules

Although you can perform a search for high-confidence association rules directly on the encoded transactions data, in most situations frequent item-sets are first extracted from the transactions. An item-set is a subset of all possible transactions, and frequent item-sets are those that meet some user-specified minimum number of occurrences (called the support level) in the transactions set. In the demo, using a support value of 0.30 (so an item-set must occur in at least 0.30 * 10 = 3 transactions), there are eight frequent item-sets. The first is (2 5). Items 2 and 5 occur together in three transactions: 7, 8 and 9. Frequent item-sets eliminate outlier transactions that might generate a very high-confidence rule but are so rare the rule isn’t particularly relevant. In most cases, frequent item-sets are distinct, with no duplicates allowed. Extracting frequent item-sets from transactions is a surprisingly difficult task. See “Frequent Item-Sets for Association Rule Learning” in the January 2013 issue of MSDN Magazine at msdn.microsoft.com/magazine/dn519928.

Behind the scenes, the demo uses the list of frequent item-sets to generate candidate rules. Each candidate rule is evaluated to determine if the rule meets a user-supplied minimum threshold of likelihood called the confidence value. Those candidate rules that meet or exceed the likelihood level are identified as high-­confidence rules. In the demo, the confidence value is set to 0.700, which means that a candidate rule must be true for at least 70 percent of the transactions for which the rule is applicable.

In the demo, 16 high-confidence rules were found. The first rule listed is IF (2) THEN (5), with a computed 0.75 confidence value. You can think of this as, “If a transaction has item 2, then the transaction probably also contains item 5.” The IF part of the rule, which is called the antecedent, applies to four transactions: 6, 7, 8 and 9. The rule is true for three of those four transactions: 7, 8 and 9, so the computed confidence is 3/4 = 0.75.

Notice that high-confidence rules are not necessarily symmetric. There is no IF (5) THEN (2) rule because that rule is applicable to transactions 1, 4, 5, 7, 8, and 9, but is true only for transactions 7, 8, and 9, so the confidence value of 3/6 = 0.50 doesn’t meet the minimum 0.700 value.

This article assumes you have advanced programming skills but doesn’t assume you know anything about association rule learning. The demo program is coded using C# but you should be able to refactor the code to other .NET languages such as Visual Basic or IronPython without too much difficulty. Most normal error-checking has been removed to keep the size of the code small and the main ideas clear. The complete source code for the demo is presented in this article and the code is also available as a download from msdn.microsoft.com/magazine/msdnmag0514.

## The Rule-Finding Algorithm

The algorithm used by the demo program to find high-confidence rules is illustrated in Figure 2. The diagram shows how frequent item-set (3 4 7) generates two non-high-confidence rules and four high-confidence rules. The algorithm starts by generating all mathematical combinations for size k = 1 through size item-set length - 1. Mathematical combinations are the key to the association rule learning algorithm presented in this article. A mathematical combination is a set of numbers that represents a subset. For example, if there are 5 items = (0, 1, 2, 3, 4) and the subset size k is 3, the 10 possible combination elements are:

``````(0, 1, 2)
(0, 1, 3)
(0, 1, 4)
(0, 2, 3)
(0, 2, 4)
(0, 3, 4)
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
``````

Figure 2 Algorithm to Find High-Confidence Rules

The elements in a mathematical combination are not transaction items, they’re just numbers. In Figure 2, each combination is applied to the frequent item-set to generate a subset of the item-set, which is interpreted as the antecedent (if-part) of a candidate rule. For example, the last combination for k = 2 is (1, 2) so the items in the frequent item-set (3 4 7) at indices 1 and 2 are used as the candidate rule antecedent: “IF (4 7).” The then-part of the candidate rule consists of those items in the item-set being examined that are not used in the if-part. So for item-set (3 4 7), if the antecedent is (4 7), the then-part (called the consequent) is (3), and the full candidate rule is “IF (4 7) THEN (3).”

Somewhat surprisingly, the then-part of a candidate rule isn’t needed to compute the rule’s confidence value. Consider the candidate rule IF (4 7) THEN (3). To compute the confidence, a count of transactions to which the rule is applicable is needed. These would be those transactions that contain items 4 and 7, which in the case of the demo is 3 (transactions 2, 3 and 6). The second part of the computation needed is the number of applicable transactions for which the candidate rule is true, in other words, those transactions that have items 4, 7 and also item 3. But this is just the number of transactions that contain the source frequent item-set (3 4 7).

In Figure 2, the confidence values for each of the six candidate rules are computed, and the four that meet the confidence threshold are identified as high-confidence rules.

To summarize, starting from a set of frequent item-sets that meet a minimum frequency-of-occurrence support level, for each frequent item, all mathematical combinations from size 1 through one less than the number of items in the item-set are generated. Each combination determines an IF part and a THEN part of a candidate rule. Candidate rules that meet a minimum confidence level, which is the proportion of applicable frequent item-sets for which the rule is true, are labeled as good (high-confidence) rules and saved.

## Overall Program Structure

To create the demo program I launched Visual Studio. The demo program has no significant .NET dependencies and any version of Visual Studio that supports the Microsoft .NET Framework 2.0 or later should work fine. I created a C# console application program and named it AssocRuleLearn. After the template-generated code loaded into the editor, in the Solution Explorer window I renamed Program.cs to the more descriptive AssocRuleProgram.cs and Visual Studio automatically renamed class Program for me. At the top of the source code, I deleted all references to unneeded namespaces, leaving just those to System and Collections.Generic.

The overall program structure, with some WriteLine statements removed and a few minor edits to save space, is presented in Figure 3.

Figure 3 Association Rule Demo Program Structure

``````using System;
using System.Collections.Generic;
namespace AssocRuleLearn
{
class AssocRuleProgram
{
static void Main(string[] args)
{
try
{
Console.WriteLine("\nBegin demo\n");
List<int[]> transactions = new List<int[]>();
transactions.Add(new int[] { 0, 3, 4, 11 });
transactions.Add(new int[] { 1, 4, 5 });
transactions.Add(new int[] { 3, 4, 6, 7 });
transactions.Add(new int[] { 3, 4, 6, 7 });
transactions.Add(new int[] { 0, 5 });
transactions.Add(new int[] { 3, 5, 9 });
transactions.Add(new int[] { 2, 3, 4, 7 });
transactions.Add(new int[] { 2, 5, 8 });
transactions.Add(new int[] { 0, 1, 2, 5, 10 });
transactions.Add(new int[] { 2, 3, 5, 6, 7, 9 });
ShowList(transactions);
List<int[]> freqItemSets = new List<int[]>();
freqItemSets.Add(new int[] { 2, 5 });
freqItemSets.Add(new int[] { 3, 4 });
freqItemSets.Add(new int[] { 3, 6 });
freqItemSets.Add(new int[] { 3, 7 });
freqItemSets.Add(new int[] { 4, 7 });
freqItemSets.Add(new int[] { 6, 7 });
freqItemSets.Add(new int[] { 3, 4, 7 });
freqItemSets.Add(new int[] { 3, 6, 7 });
ShowList(freqItemSets);
double minConPct = 0.70;
List<Rule> goodRules =
GetHighConfRules(freqItemSets, transactions, minConPct);
Console.WriteLine("\nDone. Rules are:\n");
for (int i = 0; i < goodRules.Count; ++i)
Console.WriteLine(goodRules[i].ToString());
Console.WriteLine("\nEnd demo\n");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
} // Main
static void ShowList(List<int[]> trans)
{
for (int i = 0; i < trans.Count; ++i)
{
for (int j = 0; j < trans[i].Length; ++j)
Console.Write(trans[i][j] + " ");
Console.WriteLine(")");
}
}
static List<Rule> GetHighConfRules(List<int[]> freqItemSets,
List<int[]> transactions, double minConfidencePct) { . . }
static int[] NewCombination(int k) { . . }
static int[] NextCombination(int[] comb, int n) { . . }
static int[] MakeAntecedent(int[] itemSet, int[]comb) { . . }
static int[] MakeConsequent(int[] itemSet, int[]comb) { . . }
static int CountInTrans(int[] itemSet,
List<int[]> trans, Dictionary<int[], int> countDict) { . . }
static bool IsSubsetOf(int[] itemSet, int[] trans) { . . }
static int IndexOf(int[] array, int item, int startIdx) { . . }
}
public class Rule
{
public int[] antecedent; // If part
public int[] consequent; // Then part
public double confidence;
public Rule(int[] antecedent, int[] consequent, double confidence)
{
this.antecedent = new int[antecedent.Length];
Array.Copy(antecedent, this.antecedent, antecedent.Length);
this.consequent = new int[consequent.Length];
Array.Copy(consequent, this.consequent, consequent.Length);
this.confidence = confidence;
}
public override string ToString()
{
string s = "IF ( ";
for (int i = 0; i < antecedent.Length; ++i)
s += antecedent[i] + " ";
s += ")";
string t = " THEN ( ";
for (int i = 0; i < consequent.Length; ++i)
t += consequent[i] + " ";
t += ") ";
return s + t + "conf = " + confidence.ToString("F2");
}
}
}
``````

The demo defines both transactions and item-sets as arrays of type int and sets up a List of hardcoded transactions. You may want to create a program-defined ItemSet class. In most situations the raw transaction data will be in a text file or SQL database and need to be encoded. The demo sets up hardcoded frequent item-sets. In all but very small demo scenarios you’ll need to extract frequent item-sets programmatically, which is a very difficult task.

All the work of finding high-confidence rules is performed by method GetHighConfRules. That method requires a user-specified minimum confidence percentage parameter value. Meaningful confidence values will vary from problem to problem.

The demo program uses a program-defined Rule class. Class members are the rule antecedent (if-part), rule consequent (then-part) and the computed confidence value. The class has just a constructor and a ToString method, which is hacked to format the demo data nicely.

Method GetHighConfRules calls several helper methods. Method CountInTrans counts the number of times an item-set or rule antecedent or rule consequent occurs in a list of transactions. This helper method calls sub-helper IsSubsetOf, which in turn calls sub-helper IndexOf. Method NewCombination creates a mathematical combination. Method NextCombination returns the next combination element for a given combination. Methods MakeAntecedent and MakeConsequent return the if-part and the then-part for a candidate rule, given a frequent item-set and a mathematical combination.

## Mathematical Combinations

If you refer to the diagram in Figure 2, you’ll see the algorithm used by the demo program requires creating mathematical combinations and the ability to generate the successor to a given combination. Method NewCombination is defined as:

``````static int[] NewCombination(int k)
{
int[] result = new int[k];
for (int i = 0; i < result.Length; ++i)
result[i] = i;
return result;
}
``````

Here, a combination is an array of int. The input parameter k determines the size of the combination. A new combination is values 0 through k-1, in order. Method NextCombination is defined as:

``````static int[] NextCombination(int[] comb, int n)
{
int[] result = new int[comb.Length];
int k = comb.Length;
if (comb[0] == n - k) return null;
Array.Copy(comb, result, comb.Length);
int i = k - 1;
while (i > 0 && result[i] == n - k + i) --i;
++result[i];
for (int j = i; j < k - 1; ++j)
result[j + 1] = result[j] + 1;
return result;
}
``````

Mathematical combinations are fascinating topics in their own right, and somewhat complicated. Method NextCombination is short but not trivial. It accepts a combination, and the number of possible items in the combination. The method returns the lexicographical successor to the input combination, or null if the input combination is the last one in lexicographical order. For example, if a combination with n = 6 and k = 3 is (1, 4, 5) then the next combination is (2, 3, 4).

## Generating Rule Antecedents and Consequents

Helper method MakeAntecedent accepts a frequent item-set and a mathematical combination, and returns the if-part of a candidate rule. For example, if a frequent item-set is (1 3 4 6 8) and a combination is (0, 2), the item values at indices 0 and 2 are extracted giving an antecedent of (1 4):

``````static int[] MakeAntecedent(int[] itemSet, int[] comb)
{
int[] result = new int[comb.Length];
for (int i = 0; i < comb.Length; ++i) {
int idx = comb[i];
result[i] = itemSet[idx];
}
return result;
}
``````

Although short, the code can be a bit confusing because integers represent combination element values, item-set item values and item-set index values. If you trace through an example or two by hand, you should see how the method works.

Method MakeConsequent generates the then-part for a candidate rule. For example, if a frequent item-set is (1 3 4 6 8) and a combination is (0, 2), the item values at those indices not equal to 0 and 2 are extracted giving a consequent of (3 6 8), as shown here:

``````static int[] MakeConsequent(int[] itemSet, int[] comb)
{
int[] result = new int[itemSet.Length - comb.Length];
int j = 0; // ptr into combination
int p = 0; // ptr into result
for (int i = 0; i < itemSet.Length; ++i) {
if (j < comb.Length && i == comb[j])
++j;
else
result[p++] = itemSet[i];
}
return result;
}
``````

I designed MakeConsequent so that it accepts the same param­eters as method MakeAntecedent. A somewhat simpler but asymmetric alternative is to define MakeConsequent so that it accepts an item-set and an antecedent.

## Counting Occurrences in Transactions

Key helper method CountInTrans accepts an array of int that can represent a frequent item-set or a candidate rule antecedent, a list of transactions, and a Dictionary collection, and returns the number of times the item-set or antecedent occurs. The Dictionary object stores item-set and antecedent counts and is used as a lookup so those item-sets or antecedents that have already been processed don’t need to be recounted:

``````static int CountInTrans(int[] itemSet, List<int[]> trans,
Dictionary<int[], int> countDict)
{
if (countDict.ContainsKey(itemSet) == true)
return countDict[itemSet];
int ct = 0;
for (int i = 0; i < trans.Count; ++i)
if (IsSubsetOf(itemSet, trans[i]) == true)
++ct;
return ct;
}
``````

Most of the work is done by helper method IsSubsetOf. The method takes advantage of the fact that transaction items are assumed to be stored in order, which means after a particular item has been found, the search for the next item can start at the next index position:

``````static bool IsSubsetOf(int[] itemSet, int[] trans)
{
int foundIdx = -1;
for (int j = 0; j < itemSet.Length; ++j) {
foundIdx = IndexOf(trans, itemSet[j], foundIdx + 1);
if (foundIdx == -1) return false;
}
return true;
}
``````

Helper method IndexOf also takes advantage of the ordered property of transactions to early-exit when at an index that’s past the point where a target item could possibly be:

``````static int IndexOf(int[] array, int item, int startIdx)
{
for (int i = startIdx; i < array.Length; ++i) {
if (i > item) return -1;
if (array[i] == item) return i;
}
return -1;
}
``````

## Generating High-Confidence Rules

With all the helper methods in place, method GetHighConfRules can be defined without too much difficulty. Figure 4 shows the method in high-level pseudocode.

Figure 4 Pseudocode for Method GetHighConfRules

``````for each frequent item-set
ctItemSet = count times item-set is in transactions
for subset length 1 to item-set length - 1
create a new math combination
loop over each possible math combination
create candidate rule if-part
create candidate rule then-part
compute confidence of candidate rule
if candidate rule meets min confidence, save
end loop
end for
end for
return saved rules
``````

The implementation of method GetHighConfRules is listed in Figure 5. Although you can use method GetHighConfRules as listed in Figure 5, there are many opportunities to enhance performance, usually at the expense of memory or code clarity. For example, if you refer to Figure 2, you’ll observe that each candidate rule antecedent-consequent pair has a mirror candidate rule with the antecedent and consequent switched. For example, if an item-set is (2 5 7 9), then one candidate rule with subset size k = 2 is IF (2 7) THEN (5 9) and another candidate rule is IF (5 9) THEN (2 7). So instead of computing antecedents for all possible values of frequent item-set subset size, you can just compute antecedents and consequents for half the possible subset sizes.

Figure 5 Method GetHighConfRules

``````static List<Rule> GetHighConfRules(List<int[]> freqItemSets,
List<int[]> trans, double minConfidencePct)
{
List<Rule> result = new List<Rule>();
Dictionary<int[], int> itemSetCountDict =
new Dictionary<int[], int>();
for (int i = 0; i < freqItemSets.Count; ++i)
{
int[] currItemSet = freqItemSets[i]; // for clarity
int ctItemSet = CountInTrans(currItemSet, trans, itemSetCountDict);
for (int len = 1; len <= currItemSet.Length - 1; ++len)
{
int[] c = NewCombination(len);
while (c != null) // each combination makes a candidate rule
{
int[] ante = MakeAntecedent(currItemSet, c);
int[] cons = MakeConsequent(currItemSet, c); // could defer
int ctAntecendent = CountInTrans(ante, transactions,
itemSetCountDict);
double confidence = (ctItemSet * 1.0) / ctAntecendent;
if (confidence >= minConfidencePct) {
Rule r = new Rule(ante, cons, confidence);
}
c = NextCombination(c, currItemSet.Length);
} // while each combination
} // len each possible antecedent for curr item-set
} // i each freq item-set
return result;
}
``````

Also, method CountInTrans uses a lookup dictionary of saved counts. This is helpful when counting antecedent occurrences because different frequent item-sets can generate the same ante­cedents. But if frequent item-sets are unique, then the check in the lookup dictionary is a waste of time. Notice the Dictionary parameter is both used and updated so you might want to define it as a ref parameter to make its dual purpose explicit. If you trace through method GetHighConfRules, you’ll see several other ways to modify the code.

One important optimization possibility takes advantage of the fact that, for a given item-set, if a candidate rule doesn’t meet the minimum confidence threshold, then any other candidate rule that contains the first rule’s consequent can’t meet the confidence threshold. For example, suppose a rule IF (2 4 8 9) THEN (3 5) doesn’t meet minimum confidence. Then any rule that has (3 5) in its consequent, for example, IF (4 8 9) THEN (2 3 5), will not meet minimum confidence, either.

## Wrapping Up

The explanation of association rules and demo code presented in this article should get you up and running if you want to experiment with extracting high-confidence association rules, or create a stand-alone association rule utility program, or add association rule functionality to a software application program.

Unlike several machine learning techniques that are intended to make predictions, association rule learning is an exploratory technique intended to reveal interesting and possibly useful relationships between items. This means you’ll have to use a bit of trial and error when finding association rules.

There’s a large body of research literature about association rule learning. If you’re interested in the theory, I recommend chapter 6 of the book, “Introduction to Data Mining” (Addison-Wesley, 2005) by P. Tan, M. Steinbach, and V. Kumar. Chapter 6 is available for free in PDF format at bit.ly/1m3dbZ9.

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Internet Explorer and Bing. Dr. McCaffrey can be reached at jammc@microsoft.com.

Thanks to the following Microsoft technical expert for reviewing this article: Richard Hughes and Kirk Olynik