# MatrixFactorizationTrainer Class

## Definition

The IEstimator<TTransformer> to predict elements in a matrix using matrix factorization (also known as a type of collaborative filtering).

public sealed class MatrixFactorizationTrainer : Microsoft.ML.IEstimator<Microsoft.ML.Trainers.Recommender.MatrixFactorizationPredictionTransformer>, Microsoft.ML.Trainers.ITrainerEstimator<Microsoft.ML.Trainers.Recommender.MatrixFactorizationPredictionTransformer,Microsoft.ML.Trainers.Recommender.MatrixFactorizationModelParameters>
type MatrixFactorizationTrainer = class
interface ITrainerEstimator<MatrixFactorizationPredictionTransformer, MatrixFactorizationModelParameters>
interface IEstimator<MatrixFactorizationPredictionTransformer>
Public NotInheritable Class MatrixFactorizationTrainer
Implements IEstimator(Of MatrixFactorizationPredictionTransformer), ITrainerEstimator(Of MatrixFactorizationPredictionTransformer, MatrixFactorizationModelParameters)
Inheritance
MatrixFactorizationTrainer
Implements

## Remarks

To create this trainer, use MatrixFactorization or MatrixFactorization(Options).

### Input and Output Columns

There are three input columns required, one for matrix row indexes, one for matrix column indexes, and one for values (i.e., labels) in matrix. They together define a matrix in COO format. The type for label column is a vector of Single while the other two columns are key type scalar.

Output Column Name Column Type Description
Score Single The predicted matrix value at the location specified by input columns (row index column and column index column).

### Trainer Characteristics

Is normalization required? Yes
Is caching required? Yes
Required NuGet in addition to Microsoft.ML Microsoft.ML.Recommender
Exportable to ONNX No

### Background

The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix. In this module, the expected training data (the factorized matrix) is a list of tuples. Every tuple consists of a column index, a row index, and the value at the location specified by the two indices. For an example data structure of a tuple, one can use:

// The following variables defines the shape of a m-by-n matrix. Indexes start with 0; that is, our indexing system
// is 0-based.
const int m = 60;
const int n = 100;

// A tuple of row index, column index, and rating. It specifies a value in the rating matrix.
class MatrixElement
{
// Matrix column index starts from 0 and is at most n-1.
[KeyType(n)]
public uint MatrixColumnIndex;
// Matrix row index starts from 0 and is at most m-1.
[KeyType(m)]
public uint MatrixRowIndex;
// The rating at the MatrixColumnIndex-th column and the MatrixRowIndex-th row.
public float Value;
}


Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill missing values. This behavior is very helpful when building recommender systems.

To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example. Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users. That is, rating $r$ at row $u$ and column $v$ means that user $u$ give $r$ to item $v$. An incomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs). Assume that $R\in{\mathbb R}^{m\times n}$ is a m-by-n rating matrix and the rank of the two factor matrices are $P\in {\mathbb R}^{k\times m}$ and $Q\in {\mathbb R}^{k\times n}$, where $k$ is the approximation rank. The predicted rating at the $u$-th row and the $v$-th column in $R$ would be the inner product of the $u$-th row of $P$ and the $v$-th row of $Q$; that is, $R$ is approximated by the product of $P$'s transpose ($P^T$) and $Q$. Note that $k$ is usually much smaller than $m$ and $n$, so $P^T Q$ is usually called a low-rank approximation of $R$.

This trainer includes a stochastic gradient method and a coordinate descent method for finding $P$ and $Q$ via minimizing the distance between (non-missing part of) $R$ and its approximation $P^T Q$. The coordinate descent method included is specifically for one-class matrix factorization where all observed ratings are positive signals (that is, all rating values are 1). Notice that the only way to invoke one-class matrix factorization is to assign one-class squared loss to loss function when calling MatrixFactorization(Options). See Page 6 and Page 28 here for a brief introduction to standard matrix factorization and one-class matrix factorization. The default setting induces standard matrix factorization. The underlying library used in ML.NET matrix factorization can be found on a Github repository.

For users interested in the mathematical details, please see the references below.