MatrixFactorizationTrainer Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
The IEstimator<TTransformer> to predict elements in a matrix using matrix factorization (also known as a type of collaborative filtering).
public sealed class MatrixFactorizationTrainer : Microsoft.ML.IEstimator<Microsoft.ML.Trainers.Recommender.MatrixFactorizationPredictionTransformer>, Microsoft.ML.Trainers.ITrainerEstimator<Microsoft.ML.Trainers.Recommender.MatrixFactorizationPredictionTransformer,Microsoft.ML.Trainers.Recommender.MatrixFactorizationModelParameters>
type MatrixFactorizationTrainer = class
interface ITrainerEstimator<MatrixFactorizationPredictionTransformer, MatrixFactorizationModelParameters>
interface IEstimator<MatrixFactorizationPredictionTransformer>
Public NotInheritable Class MatrixFactorizationTrainer
Implements IEstimator(Of MatrixFactorizationPredictionTransformer), ITrainerEstimator(Of MatrixFactorizationPredictionTransformer, MatrixFactorizationModelParameters)
- Inheritance
-
MatrixFactorizationTrainer
- Implements
Remarks
To create this trainer, use MatrixFactorization or MatrixFactorization(Options).
Input and Output Columns
There are three input columns required, one for matrix row indexes, one for matrix column indexes, and one for values (i.e., labels) in matrix. They together define a matrix in COO format. The type for label column is a vector of Single while the other two columns are key type scalar.
Output Column Name | Column Type | Description |
---|---|---|
Score |
Single | The predicted matrix value at the location specified by input columns (row index column and column index column). |
Trainer Characteristics
Machine learning task | Recommender systems |
Is normalization required? | Yes |
Is caching required? | Yes |
Required NuGet in addition to Microsoft.ML | Microsoft.ML.Recommender |
Exportable to ONNX | No |
Background
The basic idea of matrix factorization is finding two low-rank factor matrices to approximate the training matrix. In this module, the expected training data (the factorized matrix) is a list of tuples. Every tuple consists of a column index, a row index, and the value at the location specified by the two indices. For an example data structure of a tuple, one can use:
// The following variables defines the shape of a m-by-n matrix. Indexes start with 0; that is, our indexing system
// is 0-based.
const int m = 60;
const int n = 100;
// A tuple of row index, column index, and rating. It specifies a value in the rating matrix.
class MatrixElement
{
// Matrix column index starts from 0 and is at most n-1.
[KeyType(n)]
public uint MatrixColumnIndex;
// Matrix row index starts from 0 and is at most m-1.
[KeyType(m)]
public uint MatrixRowIndex;
// The rating at the MatrixColumnIndex-th column and the MatrixRowIndex-th row.
public float Value;
}
Notice that it's not necessary to specify all entries in the training matrix, so matrix factorization can be used to fill missing values. This behavior is very helpful when building recommender systems.
To provide a better understanding on practical uses of matrix factorization, let's consider music recommendation as an example. Assume that user IDs and music IDs are used as row and column indexes, respectively, and matrix's values are ratings provided by those users. That is, rating $r$ at row $u$ and column $v$ means that user $u$ give $r$ to item $v$. An incomplete matrix is very common because not all users may provide their feedbacks to all products (for example, no one can rate ten million songs). Assume that $R\in{\mathbb R}^{m\times n}$ is a m-by-n rating matrix and the rank of the two factor matrices are $P\in {\mathbb R}^{k\times m}$ and $Q\in {\mathbb R}^{k\times n}$, where $k$ is the approximation rank. The predicted rating at the $u$-th row and the $v$-th column in $R$ would be the inner product of the $u$-th row of $P$ and the $v$-th row of $Q$; that is, $R$ is approximated by the product of $P$'s transpose ($P^T$) and $Q$. Note that $k$ is usually much smaller than $m$ and $n$, so $P^T Q$ is usually called a low-rank approximation of $R$.
This trainer includes a stochastic gradient method and a coordinate descent method for finding $P$ and $Q$ via minimizing the distance between (non-missing part of) $R$ and its approximation $P^T Q$. The coordinate descent method included is specifically for one-class matrix factorization where all observed ratings are positive signals (that is, all rating values are 1). Notice that the only way to invoke one-class matrix factorization is to assign one-class squared loss to loss function when calling MatrixFactorization(Options). See Page 6 and Page 28 here for a brief introduction to standard matrix factorization and one-class matrix factorization. The default setting induces standard matrix factorization. The underlying library used in ML.NET matrix factorization can be found on a Github repository.
For users interested in the mathematical details, please see the references below.
- For the multi-threading implementation of the used stochastic gradient method, see A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems.
- For the computation happening inside a single thread, see A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization.
- For the parallel coordinate descent method used and one-class matrix factorization formula, see Selection of Negative Samples for One-class Matrix Factorization.
- For details in the underlying library used, see LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems.
Check the See Also section for links to usage examples.
Properties
Info |
The TrainerInfo contains general parameters for this trainer. |
Methods
Fit(IDataView, IDataView) |
Trains a MatrixFactorizationTrainer using both training and validation data, returns a MatrixFactorizationPredictionTransformer. |
Fit(IDataView) |
|
GetOutputSchema(SchemaShape) |
Schema propagation for transformers. Returns the output schema of the data, if the input schema is like the one provided. |