2015-07-01

September 2014

Volume 29 Number 9

Test Run : Winnow Classification Using C#

James McCaffrey In machine learning (ML), a classification problem is one in which the goal is to create a model that predicts a result that takes on discrete, non-numeric values. For example, you might want to predict the political party (Democrat or Republican) of a person. There are many different classification algorithms and techniques, including, for example, naive Bayes classification, logistic regression classification, and neural network classification.

A very simple and interesting classification technique is using the Winnow algorithm. (The English word winnow means to remove unwanted items). The best way to get a feel for what Winnow classification is and to see where this article is headed is to take a look at the demo program in Figure 1.

Figure 1 Winnow Algorithm Classification Demo

The goal of the demo program is to predict the political party, Democrat or Republican, of a member of the U.S. House of Representatives, based on the Representative’s votes on 16 bills. The House of Representatives has 435 members. A well-known benchmark data set contains 435 items stored in a simple text file. The first three data items are:

republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n

The first column in a line is either democrat or republican. The next 16 columns are either n (a no vote), y (a yes vote), or ? (missing, unknown or abstain). The first column represents the member’s vote on a bill related to handicapped infants. The second is a bill related to a water project. And so on. You don’t need to know what bill is associated with each column, but if you’re interested you can find this data set (with descriptions) in many places on the Internet by doing a search for “UCI voting data.”

Winnow classification is designed for a very specific type of classification problem: one where the class to predict can take one of two possible values (these are called binary classification problems), and the predictor variables (often called “features” in ML terminology) also can take one of two possible values. The raw voting data set meets these requirements if the missing values are dealt with. The most common approach for dealing with missing values is simply to delete any data item that has one or more missing values. However, in the case of voting data, a missing value often represents an implied “no” vote, so the demo program replaces all missing values with “no” votes.

The demo program uses the first 100 data items in the voting set rather than all 435 items. The demo encodes independent variable “no” votes as 0 and “yes” votes as 1, and encodes the dependent political party X variable values “democrat” as 0 and “republican” as 1. The demo moves the dependent Y variable from the first column to the last column because using the last column for Y is more convenient when coding, and more common.

The demo randomly splits the 100-item encoded data set into an 80-item training set to generate the model, and a 20-item test set to evaluate the accuracy of the model. The demo uses the Winnow algorithm to find 16 weights (one for each input): 0.2500, 0.500, . . , 0.0625. Using these weights, the demo model correctly predicts the political party of 97.50 percent of the training items (78 out of 80) and 95.00 percent of the test items (19 out of 20).

The demo concludes by using the model to predict the political party of a hypothetical Representative who voted “yes” on all 16 bills. Based on the model, the prediction is that the person is a Republican.

This article assumes you have at least intermediate programming skills but doesn’t assume you know anything about the Winnow classification algorithm. The demo is coded using C#, but you shouldn’t have much trouble if you want to refactor the code to another language, such as Visual Basic .NET or JavaScript.

Understanding the Winnow Algorithm

For simplicity, suppose the training data is based on just five votes rather than 16. For example:

1, 0, 1, 1, 0 -> 0
1, 1, 1, 0, 1 -> 1
0, 0, 0, 1, 0 -> 1

So, the first line means “yes-vote,” “no-vote,” “yes-vote,” “yes-vote,” “no-vote,” “democrat.” Now suppose that a model is trained and yields weights { 0.75, 3.00, 1.00, 0.50, 0.40 }. To compute a predicted output, the algorithm just multiplies each input value by the corresponding weight, sums the terms and then compares the accumulated sum against a threshold value. For example, for the first item, the accumulated sum is:

(1 * 0.75) + (0 * 3.00) + (1 * 1.00) + (1 * 0.50) + (0 * 0.40) = 2.25

If the threshold value is 5.0 then, because the accumulated sum (2.25) is less than the threshold, the predicted Y is 0, which in this case is a correct prediction.

Training a model involves finding a set of weights so that when applied to training data with known output values, the computed outputs closely match the known outputs. In high-level pseudo-code the Winnow algorithm is:

initialize weights
loop until done
  for each training item
    compute Y
    if correct do nothing
    else if incorrect
      if computed Y is too large
        divide all relevant weights by 2.0
      else if computed Y is too small
        multiply all relevant weights by 2.0
  end for
end loop

Suppose for the three data items mentioned earlier, weights are initialized to:

{ 2.50, 2.50, 2.50, 2.50, 2.50 }

The computed Y for the first training item (1, 0, 1, 1, 0 -> 0) is:

Y = (1 * 2.50) + (0 * 2.50) + (1 * 2.50) + (1 * 2.50) + (0 * 2.50) = 7.50 -> 1

This is an incorrect prediction where the computed Y of 1 is too big compared to the desired Y of 0, so the relevant weights are divided by a factor of 2.0. The relevant weights are those weights associated with 1 inputs. The new weights are:

{ 2.50 / 2, (no change), 2.50 / 2, 2.50 / 2, (no change) }
= { 1.25, 2.50, 1.25, 1.25, 2.50 }

Now, the second training item (1, 1, 1, 0, 1 -> 1) is processed using the new weights:

Y = (1 * 1.25) + (1 * 2.50) + (1 * 1.25) + (0 * 1.25) + (1 * 2.50) = 7.50 -> 1

This is a correct prediction, so all weights are left alone. The third training item (0, 0, 0, 1, 0 -> 1) is processed using the current weights:

Y = (0 * 1.25) + (0 * 2.50) + (0 * 1.25) + (1 * 1.25) + (0 * 2.50) = 1.25 -> 0

This is an incorrect prediction where the computed Y of 0 is too small compared to the desired Y of 1, so relevant weights are multiplied by 2.0:

{ (no change), (no change), (no change), 1.25 * 2.0, (no change) }
= { 1.25, 2.50, 1.25, 2.50, 2.50 }

If you think about the Winnow training process a bit, you’ll notice two key characteristics. First, the algorithm uses only incorrect information. Second, the algorithm essentially drives the weights of irrelevant predictors to 0, winnowing them out. This often makes Winnow classification extremely effective in situations where many of the predictor variables are irrelevant. Although the basic Winnow algorithm is quite simple, there are several implementation details to solve, such as deciding how to initialize the weights and determining when to stop training.

Overall Program Structure

The overall structure of the demo program, with a few WriteLine statements removed and minor edits to save space, is presented in Figure 2. To create the demo program, I launched Visual Studio and created a new C# console application named WinnowPredict. The demo has no significant .NET version dependencies, so any version of Visual Studio should work. After the template code loaded into the editor, I deleted all using statements except for the single reference to the top-level System namespace. In the Solution Explorer window, I renamed file Program.cs to WinnowProgram.cs and Visual Studio automatically renamed class Program for me.

Figure 2 Overall Demo Program Structure

using System;
namespace WinnowPredict
{
  class WinnowProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Begin Winnow algorithm demo");
      int[][] data = new int[100][];
      data[0] = new int[] { 0, 1, 0, 1, 1, 1, 0, 0,
        0, 1, 0, 1, 1, 1, 0, 1, 1 };
      // Etc.
      // Create, train and test matrices from data
      // Instantiate Winnow classifier
      // Train using training data
      // Compute accuracy
      // Predict party of hypothetical person
      Console.WriteLine("End Winnow demo");
      Console.ReadLine();
    } // Main
    static void MakeTrainTest(int[][] data, int seed,
      out int[][] trainData, out int[][] testData) { . . }
    static void ShowVector(double[] vector, int decimals,
      int valsPerRow, bool newLine) { . . }
    static void ShowMatrix(int[][] matrix, int numRows,
      bool indices) { . . }
  }
  public class Winnow
  {
    private int numInput;
    private double[] weights;
    private double threshold;
    private double alpha;
    private static Random rnd;
    public Winnow(int numInput, int rndSeed) { . . }
    public int ComputeOutput(int[] xValues) { . . }
    public double[] Train(int[][] trainData) { . . }
    public double Accuarcy(int[][] trainData) { . . }
    private static void Shuffle(int[][] trainData) { . . }
  }
}

The demo program is a bit too long to present in its entirety in this article, but you can find the complete source code (WinnowProgram.cs) in the accompanying file download. All the Winnow classification logic is contained in a program-defined class named Winnow.

The Main method begins by setting up a 100-item data set, as shown in Figure 3.

Figure 3 The Main Method Setting up a 100-Item Data Set

using System;
namespace WinnowPredict
{
  class WinnowProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Begin Winnow algorithm prediction demo");
      int[][] data = new int[100][];
      data[0] = new int[] { 0, 1, 0, 1, 1, 1, 0, 0,
        0, 1, 0, 1, 1, 1, 0, 1, 1 };
...

To create the data, I located the UCI voting data set on the Web, copied and pasted it into Notepad, and then used edit-replace, placing the data directly into an array-of-arrays-style matrix. Even if you’re an experienced programmer, you might not be familiar with C# matrix syntax, but you’ll likely get used to the coding patterns quickly. In a non-demo scenario, you’d write some sort of method to load the data from the text file into a matrix.

Next, the first four lines and last line of the entire data set are displayed using a helper method, and the data set is split into a training set (80 items) and test set (20 items):

Console.WriteLine("\nFirst few lines of all data are: \n");
ShowMatrix(data, 4, true);
Console.WriteLine("\nSplitting data into 80% train" +
  " and 20% test matrices");
int[][] trainData = null;
int[][] testData = null;
MakeTrainTest(data, 0, out trainData, out testData);

Method MakeTrainTest has a hardcoded 80 percent to 20 percent split (you might want to parameterize the percent to use as training data). After displaying part of the training data to verify nothing went wrong, the demo instantiates a Winnow classifier object and uses the Train method to create a model, that is, to find a set of good weights:

Console.WriteLine("First few rows of training data are:");
ShowMatrix(trainData, 3, true);
Console.WriteLine("Begin training using Winnow algorithm");
int numInput = 16;
Winnow w = new Winnow(numInput, 0);
weights = w.Train(trainData);
Console.WriteLine("Training complete");

The Winnow constructor requires the number of x-variables and a seed value for random number generation, which is used to process the training items in random order. Next, the demo displays the final weights found by the Train method, and then computes the model’s accuracy on the training and test sets:

Console.WriteLine("Final model weights are:");
ShowVector(weights, 4, 8, true);
double trainAcc = w.Accuarcy(trainData);
double testAcc = w.Accuarcy(testData);
Console.WriteLine("Prediction accuracy on training data = " +
  trainAcc.ToString("F4"));
Console.WriteLine("Prediction accuracy on test data = " +
  testAcc.ToString("F4"));

The demo concludes by predicting the political party of a hypothetical U.S. House Representative who voted “yes” on all 16 bills, as shown in Figure 4.

Figure 4 Predicting the Political Party

...
  Console.WriteLine("Predicting party of Representative" + "
    with all 'yes' votes");
  int[] unknown = new int[] { 1, 1, 1, 1, 1, 1, 1, 1,
  1 , 1, 1, 1, 1, 1, 1, 1 };
  int predicted = w.ComputeOutput(unknown);
  if (predicted == 0)
    Console.WriteLine("Prediction is 'democrat'");
  else
    Console.WriteLine("Prediction is 'republican'");
  Console.WriteLine("End Winnow demo");
  Console.ReadLine();
} // Main

The Winnow Class

The Winnow class has five data members:

private int numInput;
private double[] weights;
private double threshold;
private double alpha;
private static Random rnd;

Data member threshold holds the value used to determine if computed Y is 0 (below threshold) or 1 (above threshold). Data member alpha is the factor to decrease (called demotion in research literature) or increase (called promotion) weights when an incorrect Y is computed. The standard value for alpha is 2.0, as shown earlier in this article. You may want to experiment by using two different values for promotion and demotion, instead of a single alpha value for both.

The Winnow constructor is defined like so:

public Winnow(int numInput, int rndSeed)
{
  this.numInput = numInput;
  this.weights = new double[numInput];
  for (int i = 0; i < weights.Length; ++i)
    weights[i] = numInput / 2.0;
  this.threshold = 1.0 * numInput;
  this.alpha = 2.0;
  rnd = new Random(rndSeed);
}

The threshold value is initialized to the number of input variables, for example 16.0 in the demo. This value is standard for the Winnow algorithm, but you might want to experiment with alternative values. The no-effect multiplication by 1.0 is there to suggest that different values can be used, but they should be related to the number of input variables.

The weights array is allocated using the number of input variables, 16 in the demo. Each of the weights is initialized to the number of input variables divided by 2, or 8.0 in the demo. This value is arbitrary and, again, you might want to experiment. The Winnow algorithm hasn’t been explored by research very much relative to many other, more commonly used classification algorithms.

Method ComputeOutput is straightforward:

public int ComputeOutput(int[] xValues)
{
  double sum = 0.0;
  for (int i = 0; i < numInput; ++i)
    sum += weights[i] * xValues[i];
  if (sum > this.threshold)
    return 1;
  else
    return 0;
}

Notice that a 1 value is returned if the accumulated sum is strictly greater than the threshold value, as opposed to greater than or equal to the threshold. This is arbitrary and using “>=” instead of “>” will not have a significant impact on the algorithm.

The Training Method

The Winnow algorithm training routine is one of the shortest and simplest of all ML classification algorithms. The method definition begins:

public double[] Train(int[][] trainData)
{
  int[] xValues = new int[numInput];
  int target;
  int computed;
  Shuffle(trainData);
...

Local array “xValues” holds the input values from a training item, but not the target output value. Local variable “target” holds the desired Y value from a training data item. Local variable “computed” stores the computed Y value, for a given set of input x-values and the current set of weights. Method Shuffle randomizes the order of the training data using the Fisher-Yates mini-algorithm.

Next, the main training loop begins:

for (int i = 0; i < trainData.Length; ++i)
{
  Array.Copy(trainData[i], xValues, numInput); // Inputs
  target = trainData[i][numInput]; // Last value is target
  computed = ComputeOutput(xValues);
  // Update weights here
}

As mentioned earlier, it’s not entirely clear how long to train. The demo iterates just once through the training data. An alternative is to iterate multiple times through the training data, stopping after a fixed number of iterations, or perhaps when some desired accuracy has been reached.

Inside the main training loop, weights are updated using the code in Figure 5.

Figure 5 Updating Weights

if (computed == 1 && target == 0) // Decrease
{
  for (int j = 0; j < numInput; ++j)
  {
    if (xValues[j] == 0) continue; // Not relevant
    weights[j] = weights[j] / alpha; // Demotion
  }
}
else if (computed == 0 && target == 1) // Increase
{
  for (int j = 0; j < numInput; ++j)
  {
    if (xValues[j] == 0) continue; // Not relevant
    weights[j] = weights[j] * alpha; // Promotion
  }
}

Method Train concludes by copying the best weights found into a local return-array:

...
  } // end for
  double[] result = new double[numInput]; // = number weights
  Array.Copy(this.weights, result, numInput);
  return result;
} // Train

Wrapping Up

This article is based on the research paper that originally presented the Winnow algorithm, “Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm” (1998), N. Littlestone. You can find this paper in PDF format on the Web in several locations.

While doing background research for this article, I came across an interesting implementation of the Winnow algorithm written using the F# language. The implementation is in a personal blog post written by Mathias Brandewinder (bit.ly/1z8hfj6).

Although Winnow classification is designed specifically for binary classification problems where the independent x-variables are all binary, you might want to experiment on problems where one or more x-variables are categorical. For example, suppose one predictor variable is “color” and can take one of three possible values: “red,” “green,” or “blue.” If you use 1-of-N encoding, red becomes (1, 0, 0), green becomes (0, 1, 0), blue becomes (0, 0, 1), and the Winnow algorithm can be applied.

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Internet Explorer and Bing. He can be reached at jammc@microsoft.com.

Thanks to the following technical experts at Microsoft Research for reviewing this article: Nathan Brown and Kirk Olynyk

Share via