# AveragedPerceptronTrainer Class

## Definition

The IEstimator<TTransformer> to predict a target using a linear binary classification model trained with the averaged perceptron.

public sealed class AveragedPerceptronTrainer : Microsoft.ML.Trainers.AveragedLinearTrainer<Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.Trainers.LinearBinaryModelParameters>,Microsoft.ML.Trainers.LinearBinaryModelParameters>
type AveragedPerceptronTrainer = class
inherit AveragedLinearTrainer<BinaryPredictionTransformer<LinearBinaryModelParameters>, LinearBinaryModelParameters>
Public NotInheritable Class AveragedPerceptronTrainer
Inherits AveragedLinearTrainer(Of BinaryPredictionTransformer(Of LinearBinaryModelParameters), LinearBinaryModelParameters)
Inheritance

## Remarks

To create this trainer, use AveragedPerceptron or AveragedPerceptron(Options).

### Input and Output Columns

The input label column data must be Boolean. The input features column data must be a known-sized vector of Single. This trainer outputs the following columns:

Output Column Name Column Type Description
Score Single The unbounded score that was calculated by the model.
PredictedLabel Boolean The predicted label, based on the sign of the score. A negative score maps to false and a positive score maps to true.

### Trainer Characteristics

Machine learning task Binary classification
Is normalization required? Yes
Is caching required? No
Required NuGet in addition to Microsoft.ML None
Exportable to ONNX Yes

### Training Algorithm Details

The perceptron is a classification algorithm that makes its predictions by finding a separating hyperplane. For instance, with feature values $f_0, f_1,..., f_{D-1}$, the prediction is given by determining what side of the hyperplane the point falls into. That is the same as the sign of the feautures' weighted sum, i.e. $\sum_{i = 0}^{D-1} (w_i * f_i) + b$, where $w_0, w_1,..., w_{D-1}$ are the weights computed by the algorithm, and $b$ is the bias computed by the algorithm.

The perceptron is an online algorithm, which means it processes the instances in the training set one at a time. It starts with a set of initial weights (zero, random, or initialized from a previous learner). Then, for each example in the training set, the weighted sum of the features is computed. If this value has the same sign as the label of the current example, the weights remain the same. If they have opposite signs, the weights vector is updated by either adding or subtracting (if the label is positive or negative, respectively) the feature vector of the current example, multiplied by a factor 0 < a <= 1, called the learning rate. In a generalization of this algorithm, the weights are updated by adding the feature vector multiplied by the learning rate, and by the gradient of some loss function (in the specific case described above, the loss is hinge-loss, whose gradient is 1 when it is non-zero).

In Averaged Perceptron (aka voted-perceptron), for each iteration, i.e. pass through the training data, a weight vector is calculated as explained above. The final prediction is then calculated by averaging the weighted sum from each weight vector and looking at the sign of the result.

For more information see Wikipedia entry for Perceptron or Large Margin Classification Using the Perceptron Algorithm.

Check the See Also section for links to usage examples.

## Fields

 The feature column that the trainer expects. (Inherited from TrainerEstimatorBase) The label column that the trainer expects. Can be null, which indicates that label is not used for training. (Inherited from TrainerEstimatorBase) The weight column that the trainer expects. Can be null, which indicates that weight is not used for training. (Inherited from TrainerEstimatorBase)

## Properties

 (Inherited from OnlineLinearTrainer)

## Methods

 Trains and returns a ITransformer. (Inherited from TrainerEstimatorBase) Continues the training of a OnlineLinearTrainer using an already trained modelParameters and returns a ITransformer. (Inherited from OnlineLinearTrainer) (Inherited from TrainerEstimatorBase)

## Extension Methods

 Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes. Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.