Share via

FieldAwareFactorizationMachineTrainer Class


The IEstimator<TTransformer> to predict a target using a field-aware factorization machine model trained using a stochastic gradient method.

public sealed class FieldAwareFactorizationMachineTrainer : Microsoft.ML.IEstimator<Microsoft.ML.Trainers.FieldAwareFactorizationMachinePredictionTransformer>
type FieldAwareFactorizationMachineTrainer = class
    interface IEstimator<FieldAwareFactorizationMachinePredictionTransformer>
Public NotInheritable Class FieldAwareFactorizationMachineTrainer
Implements IEstimator(Of FieldAwareFactorizationMachinePredictionTransformer)


Input and Output Columns

The input label column data must be Boolean. The input features column data must be a known-sized vector of Single.

This trainer outputs the following columns:

Output Column Name Column Type Description
Score Single The unbounded score that was calculated by the model.
PredictedLabel Boolean The predicted label, based on the sign of the score. A negative score maps to false and a positive score maps to true.
Probability Single The probability calculated by calibrating the score of having true as the label. Probability value is in range [0, 1].

To create this trainer, use FieldAwareFactorizationMachine FieldAwareFactorizationMachine, or FieldAwareFactorizationMachine(Options).

In contrast to other binary classifiers, which can only support one feature column, field-aware factorization machine can consume multiple feature columns. Each column is viewed as a container of some features and such a container is called a field. Note that all feature columns must be float vectors but their dimensions can be different. The motivation of splitting features into different fields is to model features from different distributions independently. For example, in online game store, features created from user profile and those from game profile can be assigned to two different fields.

Trainer Characteristics

Machine learning task Binary classification
Is normalization required? Yes
Is caching required? No
Required NuGet in addition to Microsoft.ML None
Exportable to ONNX No


Factorization machine family is a powerful model group for supervised learning problems. It was first introduced in Steffen Rendle's Factorization Machines paper in 2010. Later, one of its generalized versions, field-aware factorization machine, became an important predictive module in recent recommender systems and click-through rate prediction contests. For examples, see winning solutions in Steffen Rendle's KDD-Cup 2012 (Track 1 and Track 2), Criteo's, Avazu's, and Outbrain's click prediction challenges on Kaggle.

Factorization machines are especially powerful when feature conjunctions are extremely correlated to the signal you want to predict. An example of feature pairs which can form important conjunctions is user ID and music ID in music recommendation. When a dataset consists of only dense numerical features, usage of factorization machine is not recommended or some featurizations should be performed.

Scoring Function

Field-aware factorization machine is a scoring function which maps feature vectors from different fields to a scalar score. Assume that all $m$ feature columns are concatenated into a long feature vector $\textbf{x} \in {\mathbb R}^n$ and ${\mathcal F}(j)$ denotes the $j$-th feature's field indentifier. The corresponding score is $\hat{y}(\textbf{x}) = \langle \textbf{w}, \textbf{x} \rangle + \sum_{j = 1}^n \sum_{j' = j + 1}^n \langle \textbf{v}_{j, {\mathcal F}(j')}, \textbf{v}_{j', {\mathcal F}(j)} \rangle x_j x_{j'}$, where $\langle \cdot, \cdot \rangle$ is the inner product operator, $\textbf{w} \in {\mathbb R}^n$ stores the linear coefficients, and $\textbf{v}_{j, f}\in {\mathbb R}^k$ is the $j$-th feature's representation in the $f$-th field's latent space. Note that $k$ is the latent dimension specified by the user.

The predicted label is the sign of $\hat{y}$. If $\hat{y} > 0$, this model predicts true. Otherwise, it predicts false.

For a systematic introduction to field-aware factorization machine, please see this paper

Training Algorithm Details

The algorithm implemented in FieldAwareFactorizationMachineTrainer is based on a stochastic gradient method. Algorithm details is described in Algorithm 3 in this online document. The minimized loss function is logistic loss, so the trained model can be viewed as a non-linear logistic regression.

Check the See Also section for links to usage examples.



Trains and returns a FieldAwareFactorizationMachinePredictionTransformer.

Fit(IDataView, IDataView, FieldAwareFactorizationMachineModelParameters)

Continues the training of a FieldAwareFactorizationMachineTrainer using an already trained modelParameters and/or validation data, and returns a FieldAwareFactorizationMachinePredictionTransformer.


Schema propagation for transformers. Returns the output schema of the data, if the input schema is like the one provided.

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also