RecommendationCatalog.RecommendationTrainers.MatrixFactorization 方法
定義
重要
部分資訊涉及發行前產品,在發行之前可能會有大幅修改。 Microsoft 對此處提供的資訊,不做任何明確或隱含的瑕疵擔保。
多載
MatrixFactorization(MatrixFactorizationTrainer+Options) |
MatrixFactorizationTrainer使用進階選項建立 ,以使用矩陣分解來預測矩陣中的元素值。 |
MatrixFactorization(String, String, String, Int32, Double, Int32) |
建立 MatrixFactorizationTrainer ,其會使用矩陣分解來預測矩陣中的專案值。 |
MatrixFactorization(MatrixFactorizationTrainer+Options)
MatrixFactorizationTrainer使用進階選項建立 ,以使用矩陣分解來預測矩陣中的元素值。
public Microsoft.ML.Trainers.MatrixFactorizationTrainer MatrixFactorization (Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options options);
member this.MatrixFactorization : Microsoft.ML.Trainers.MatrixFactorizationTrainer.Options -> Microsoft.ML.Trainers.MatrixFactorizationTrainer
Public Function MatrixFactorization (options As MatrixFactorizationTrainer.Options) As MatrixFactorizationTrainer
參數
定型器選項。
傳回
範例
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;
namespace Samples.Dynamic.Trainers.Recommendation
{
public static class MatrixFactorizationWithOptions
{
// This example requires installation of additional nuget package at
// for Microsoft.ML.Recommender at
// https://www.nuget.org/packages/Microsoft.ML.Recommender/
// In this example we will create in-memory data and then use it to train
// a matrix factorization model with default parameters. Afterward, quality
// metrics are reported.
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for
// exception tracking and logging, as a catalog of available operations
// and as the source of randomness. Setting the seed to a fixed number
// in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);
// Create a list of training data points.
var dataPoints = GenerateMatrix();
// Convert the list of data points to an IDataView object, which is
// consumable by ML.NET API.
var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);
// Define trainer options.
var options = new MatrixFactorizationTrainer.Options
{
// Specify IDataView column which stores matrix column indexes.
MatrixColumnIndexColumnName = nameof(MatrixElement.MatrixColumnIndex
),
// Specify IDataView column which stores matrix row indexes.
MatrixRowIndexColumnName = nameof(MatrixElement.MatrixRowIndex),
// Specify IDataView column which stores matrix elements' values.
LabelColumnName = nameof(MatrixElement.Value),
// Time of going through the entire data set once.
NumberOfIterations = 10,
// Number of threads used to run this trainers.
NumberOfThreads = 1,
// The rank of factor matrices. Note that the product of the two
// factor matrices approximates the training matrix.
ApproximationRank = 32,
// Step length when moving toward stochastic gradient. Training
// algorithm may adjust it for faster convergence. Note that faster
// convergence means we can use less iterations to achieve similar
// test scores.
LearningRate = 0.3
};
// Define the trainer.
var pipeline = mlContext.Recommendation().Trainers.MatrixFactorization(
options);
// Train the model.
var model = pipeline.Fit(trainingData);
// Run the model on training data set.
var transformedData = model.Transform(trainingData);
// Convert IDataView object to a list.
var predictions = mlContext.Data
.CreateEnumerable<MatrixElement>(transformedData,
reuseRowObject: false).Take(5).ToList();
// Look at 5 predictions for the Label, side by side with the actual
// Label for comparison.
foreach (var p in predictions)
Console.WriteLine($"Actual value: {p.Value:F3}," +
$"Predicted score: {p.Score:F3}");
// Expected output:
// Actual value: 0.000, Predicted score: 0.031
// Actual value: 1.000, Predicted score: 0.863
// Actual value: 2.000, Predicted score: 1.821
// Actual value: 3.000, Predicted score: 2.714
// Actual value: 4.000, Predicted score: 3.176
// Evaluate the overall metrics
var metrics = mlContext.Regression.Evaluate(transformedData,
labelColumnName: nameof(MatrixElement.Value),
scoreColumnName: nameof(MatrixElement.Score));
PrintMetrics(metrics);
// Expected output:
// Mean Absolute Error: 0.18
// Mean Squared Error: 0.05
// Root Mean Squared Error: 0.23
// RSquared: 0.97 (closer to 1 is better. The worst case is 0)
}
// The following variables are used to define the shape of the example
// matrix. Its shape is MatrixRowCount-by-MatrixColumnCount. Because in
// ML.NET key type's minimal value is zero, the first row index is always
// zero in C# data structure (e.g., MatrixColumnIndex=0 and MatrixRowIndex=0
// in MatrixElement below specifies the value at the upper-left corner in
// the training matrix). If user's row index starts with 1, their row index
// 1 would be mapped to the 2nd row in matrix factorization module and their
// first row may contain no values. This behavior is also true to column
// index.
private const uint MatrixColumnCount = 60;
private const uint MatrixRowCount = 100;
// Generate a random matrix by specifying all its elements.
private static List<MatrixElement> GenerateMatrix()
{
var dataMatrix = new List<MatrixElement>();
for (uint i = 0; i < MatrixColumnCount; ++i)
for (uint j = 0; j < MatrixRowCount; ++j)
dataMatrix.Add(new MatrixElement()
{
MatrixColumnIndex = i,
MatrixRowIndex = j,
Value = (i + j) % 5
});
return dataMatrix;
}
// A class used to define a matrix element and capture its prediction
// result.
private class MatrixElement
{
// Matrix column index. Its allowed range is from 0 to
// MatrixColumnCount - 1.
[KeyType(MatrixColumnCount)]
public uint MatrixColumnIndex { get; set; }
// Matrix row index. Its allowed range is from 0 to MatrixRowCount - 1.
[KeyType(MatrixRowCount)]
public uint MatrixRowIndex { get; set; }
// The actual value at the MatrixColumnIndex-th column and the
// MatrixRowIndex-th row.
public float Value { get; set; }
// The predicted value at the MatrixColumnIndex-th column and the
// MatrixRowIndex-th row.
public float Score { get; set; }
}
// Print some evaluation metrics to regression problems.
private static void PrintMetrics(RegressionMetrics metrics)
{
Console.WriteLine("Mean Absolute Error: " + metrics.MeanAbsoluteError);
Console.WriteLine("Mean Squared Error: " + metrics.MeanSquaredError);
Console.WriteLine("Root Mean Squared Error: " +
metrics.RootMeanSquaredError);
Console.WriteLine("RSquared: " + metrics.RSquared);
}
}
}
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;
namespace Samples.Dynamic.Trainers.Recommendation
{
public static class OneClassMatrixFactorizationWithOptions
{
// This example shows the use of ML.NET's one-class matrix factorization
// module which implements a coordinate descent method described in
// Algorithm 1 in the paper found at
// https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf
// See page 28 in of the slides
// at https://www.csie.ntu.edu.tw/~cjlin/talks/facebook.pdf for a brief
// introduction to one-class matrix factorization.
// In this example we will create in-memory data and then use it to train a
// one-class matrix factorization model. Afterward, prediction values are
// reported. To run this example, it requires installation of additional
// nuget package Microsoft.ML.Recommender found at
// https://www.nuget.org/packages/Microsoft.ML.Recommender/
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for
// exception tracking and logging, as a catalog of available operations
// and as the source of randomness.
var mlContext = new MLContext(seed: 0);
// Get a small in-memory dataset.
GetOneClassMatrix(out List<MatrixElement> data,
out List<MatrixElement> testData);
// Convert the in-memory matrix into an IDataView so that ML.NET
// components can consume it.
var dataView = mlContext.Data.LoadFromEnumerable(data);
// Create a matrix factorization trainer which takes "Value" as the
// training label, "MatrixColumnIndex" as the matrix's column index, and
// "MatrixRowIndex" as the matrix's row index. Here nameof(...) is used
// to extract field
// names' in MatrixElement class.
var options = new MatrixFactorizationTrainer.Options
{
MatrixColumnIndexColumnName = nameof(
MatrixElement.MatrixColumnIndex),
MatrixRowIndexColumnName = nameof(MatrixElement.MatrixRowIndex),
LabelColumnName = nameof(MatrixElement.Value),
NumberOfIterations = 20,
NumberOfThreads = 8,
ApproximationRank = 32,
Alpha = 1,
// The desired values of matrix elements not specified in the
// training set. If the training set doesn't tell the value at the
// u -th row and v-th column, its desired value would be set 0.15.
// In other words, this parameter determines the value of all
// missing matrix elements.
C = 0.15,
// This argument enables one-class matrix factorization.
LossFunction = MatrixFactorizationTrainer.LossFunctionType
.SquareLossOneClass
};
var pipeline = mlContext.Recommendation().Trainers.MatrixFactorization(
options);
// Train a matrix factorization model.
var model = pipeline.Fit(dataView);
// Apply the trained model to the test set. Notice that training is a
// partial
var prediction = model.Transform(mlContext.Data.LoadFromEnumerable(
testData));
var results = mlContext.Data.CreateEnumerable<MatrixElement>(prediction,
false).ToList();
// Feed the test data into the model and then iterate through a few
// predictions.
foreach (var pred in results.Take(15))
Console.WriteLine($"Predicted value at row " +
$"{pred.MatrixRowIndex - 1} and column " +
$"{pred.MatrixColumnIndex - 1} is {pred.Score} and its " +
$"expected value is {pred.Value}.");
// Expected output similar to:
// Predicted value at row 0 and column 0 is 0.9873335 and its expected value is 1.
// Predicted value at row 1 and column 0 is 0.1499522 and its expected value is 0.15.
// Predicted value at row 2 and column 0 is 0.1499791 and its expected value is 0.15.
// Predicted value at row 3 and column 0 is 0.1499254 and its expected value is 0.15.
// Predicted value at row 4 and column 0 is 0.1499074 and its expected value is 0.15.
// Predicted value at row 5 and column 0 is 0.1499968 and its expected value is 0.15.
// Predicted value at row 6 and column 0 is 0.1499791 and its expected value is 0.15.
// Predicted value at row 7 and column 0 is 0.1499805 and its expected value is 0.15.
// Predicted value at row 8 and column 0 is 0.1500055 and its expected value is 0.15.
// Predicted value at row 9 and column 0 is 0.1499199 and its expected value is 0.15.
// Predicted value at row 10 and column 0 is 0.9873335 and its expected value is 1.
// Predicted value at row 11 and column 0 is 0.1499522 and its expected value is 0.15.
// Predicted value at row 12 and column 0 is 0.1499791 and its expected value is 0.15.
// Predicted value at row 13 and column 0 is 0.1499254 and its expected value is 0.15.
// Predicted value at row 14 and column 0 is 0.1499074 and its expected value is 0.15.
//
// Note: use the advanced options constructor to set the number of
// threads to 1 for a deterministic behavior.
// Assume that row index is user ID and column index game ID, the
// following list contains the games recommended by the trained model.
// Note that sometime, you may want to exclude training data from your
// predicted results because those would represent games that were
// already purchased. The variable topColumns stores two matrix elements
// with the highest predicted scores on the 1st row.
var topColumns = results.Where(element => element.MatrixRowIndex == 1)
.OrderByDescending(element => element.Score).Take(2);
Console.WriteLine("Top 2 predictions on the 1st row:");
foreach (var top in topColumns)
Console.WriteLine($"Predicted value at row " +
$"{top.MatrixRowIndex - 1} and column " +
$"{top.MatrixColumnIndex - 1} is {top.Score} and its " +
$"expected value is {top.Value}.");
// Expected output similar to:
// Top 2 predictions at the 2nd row:
// Predicted value at row 0 and column 0 is 0.9871138 and its expected value is 1.
// Predicted value at row 0 and column 10 is 0.9871138 and its expected value is 1.
}
// The following variables defines the shape of a matrix. Its shape is
// _synthesizedMatrixRowCount-by-_synthesizedMatrixColumnCount.
// Because in ML.NET key type's minimal value is zero, the first row index
// is always zero in C# data structure (e.g., MatrixColumnIndex=0 and
// MatrixRowIndex=0 in MatrixElement below specifies the value at the
// upper-left corner in the training matrix). If user's row index
// starts with 1, their row index 1 would be mapped to the 2nd row in matrix
// factorization module and their first row may contain no values.
// This behavior is also true to column index.
private const uint _synthesizedMatrixColumnCount = 60;
private const uint _synthesizedMatrixRowCount = 100;
// A data structure used to encode a single value in matrix
private class MatrixElement
{
// Matrix column index. Its allowed range is from 0 to
// _synthesizedMatrixColumnCount - 1.
[KeyType(_synthesizedMatrixColumnCount)]
public uint MatrixColumnIndex { get; set; }
// Matrix row index. Its allowed range is from 0 to
// _synthesizedMatrixRowCount - 1.
[KeyType(_synthesizedMatrixRowCount)]
public uint MatrixRowIndex { get; set; }
// The value at the MatrixColumnIndex-th column and the
// MatrixRowIndex-th row.
public float Value { get; set; }
// The predicted value at the MatrixColumnIndex-th column and the
// MatrixRowIndex-th row.
public float Score { get; set; }
}
// Create an in-memory matrix as a list of tuples (column index, row index,
// value). Notice that one-class matrix factorization handle scenerios where
// only positive signals (e.g., on Facebook, only likes are recorded and no
// dislike before) can be observed so that all values are set to 1.
private static void GetOneClassMatrix(
out List<MatrixElement> observedMatrix,
out List<MatrixElement> fullMatrix)
{
// The matrix factorization model will be trained only using
// observedMatrix but we will see it can learn all information carried
// sin fullMatrix.
observedMatrix = new List<MatrixElement>();
fullMatrix = new List<MatrixElement>();
for (uint i = 0; i < _synthesizedMatrixColumnCount; ++i)
for (uint j = 0; j < _synthesizedMatrixRowCount; ++j)
{
if ((i + j) % 10 == 0)
{
// Set observed elements' values to 1 (means like).
observedMatrix.Add(new MatrixElement()
{
MatrixColumnIndex = i,
MatrixRowIndex = j,
Value = 1,
Score = 0
});
fullMatrix.Add(new MatrixElement()
{
MatrixColumnIndex = i,
MatrixRowIndex = j,
Value = 1,
Score = 0
});
}
else
// Set unobserved elements' values to 0.15, a value smaller
// than observed values (means dislike).
fullMatrix.Add(new MatrixElement()
{
MatrixColumnIndex = i,
MatrixRowIndex = j,
Value = 0.15f,
Score = 0
});
}
}
}
}
備註
矩陣分解的基本概念是尋找兩個低排名因數矩陣,以近似定型矩陣。
在本課程模組中,預期的定型資料是 Tuple 的清單。 每個 Tuple 都包含資料行索引、資料列索引,以及兩個索引所指定位置的值。 定型組態會在 中 MatrixFactorizationTrainer.Options 編碼。 若要叫用一元矩陣分解,使用者必須指定 SquareLossOneClass 。 預設設定 SquareLossRegression 適用于標準矩陣分解問題。
適用於
MatrixFactorization(String, String, String, Int32, Double, Int32)
建立 MatrixFactorizationTrainer ,其會使用矩陣分解來預測矩陣中的專案值。
public Microsoft.ML.Trainers.MatrixFactorizationTrainer MatrixFactorization (string labelColumnName, string matrixColumnIndexColumnName, string matrixRowIndexColumnName, int approximationRank = 8, double learningRate = 0.1, int numberOfIterations = 20);
member this.MatrixFactorization : string * string * string * int * double * int -> Microsoft.ML.Trainers.MatrixFactorizationTrainer
Public Function MatrixFactorization (labelColumnName As String, matrixColumnIndexColumnName As String, matrixRowIndexColumnName As String, Optional approximationRank As Integer = 8, Optional learningRate As Double = 0.1, Optional numberOfIterations As Integer = 20) As MatrixFactorizationTrainer
參數
- matrixColumnIndexColumnName
- String
裝載矩陣資料行識別碼的資料行名稱。 資料行資料必須是 KeyDataViewType 。
- matrixRowIndexColumnName
- String
裝載矩陣資料列識別碼的資料行名稱。 資料行資料必須是 KeyDataViewType 。
- approximationRank
- Int32
近似矩陣的排名。
- learningRate
- Double
初始學習速率。 它會指定定型演算法的速度。
- numberOfIterations
- Int32
定型反覆運算的次數。
傳回
範例
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
namespace Samples.Dynamic.Trainers.Recommendation
{
public static class MatrixFactorization
{
// This example requires installation of additional nuget package at
// for Microsoft.ML.Recommender at
// https://www.nuget.org/packages/Microsoft.ML.Recommender/
// In this example we will create in-memory data and then use it to train
// a matrix factorization model with default parameters. Afterward, quality
// metrics are reported.
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for
// exception tracking and logging, as a catalog of available operations
// and as the source of randomness. Setting the seed to a fixed number
// in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);
// Create a list of training data points.
var dataPoints = GenerateMatrix();
// Convert the list of data points to an IDataView object, which is
// consumable by ML.NET API.
var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);
// Define the trainer.
var pipeline = mlContext.Recommendation().Trainers.
MatrixFactorization(nameof(MatrixElement.Value),
nameof(MatrixElement.MatrixColumnIndex),
nameof(MatrixElement.MatrixRowIndex), 10, 0.2, 1);
// Train the model.
var model = pipeline.Fit(trainingData);
// Run the model on training data set.
var transformedData = model.Transform(trainingData);
// Convert IDataView object to a list.
var predictions = mlContext.Data
.CreateEnumerable<MatrixElement>(transformedData,
reuseRowObject: false).Take(5).ToList();
// Look at 5 predictions for the Label, side by side with the actual
// Label for comparison.
foreach (var p in predictions)
Console.WriteLine($"Actual value: {p.Value:F3}," +
$"Predicted score: {p.Score:F3}");
// Expected output:
// Actual value: 0.000, Predicted score: 1.234
// Actual value: 1.000, Predicted score: 0.792
// Actual value: 2.000, Predicted score: 1.831
// Actual value: 3.000, Predicted score: 2.670
// Actual value: 4.000, Predicted score: 2.362
// Evaluate the overall metrics
var metrics = mlContext.Regression.Evaluate(transformedData,
labelColumnName: nameof(MatrixElement.Value),
scoreColumnName: nameof(MatrixElement.Score));
PrintMetrics(metrics);
// Expected output:
// Mean Absolute Error: 0.67:
// Mean Squared Error: 0.79
// Root Mean Squared Error: 0.89
// RSquared: 0.61 (closer to 1 is better. The worst case is 0)
}
// The following variables are used to define the shape of the example
// matrix. Its shape is MatrixRowCount-by-MatrixColumnCount. Because in
// ML.NET key type's minimal value is zero, the first row index is always
// zero in C# data structure (e.g., MatrixColumnIndex=0 and MatrixRowIndex=0
// in MatrixElement below specifies the value at the upper-left corner in
// the training matrix). If user's row index starts with 1, their row index
// 1 would be mapped to the 2nd row in matrix factorization module and their
// first row may contain no values. This behavior is also true to column
// index.
private const uint MatrixColumnCount = 60;
private const uint MatrixRowCount = 100;
// Generate a random matrix by specifying all its elements.
private static List<MatrixElement> GenerateMatrix()
{
var dataMatrix = new List<MatrixElement>();
for (uint i = 0; i < MatrixColumnCount; ++i)
for (uint j = 0; j < MatrixRowCount; ++j)
dataMatrix.Add(new MatrixElement()
{
MatrixColumnIndex = i,
MatrixRowIndex = j,
Value = (i + j) % 5
});
return dataMatrix;
}
// A class used to define a matrix element and capture its prediction
// result.
private class MatrixElement
{
// Matrix column index. Its allowed range is from 0 to
// MatrixColumnCount - 1.
[KeyType(MatrixColumnCount)]
public uint MatrixColumnIndex { get; set; }
// Matrix row index. Its allowed range is from 0 to MatrixRowCount - 1.
[KeyType(MatrixRowCount)]
public uint MatrixRowIndex { get; set; }
// The actual value at the MatrixColumnIndex-th column and the
// MatrixRowIndex-th row.
public float Value { get; set; }
// The predicted value at the MatrixColumnIndex-th column and the
// MatrixRowIndex-th row.
public float Score { get; set; }
}
// Print some evaluation metrics to regression problems.
private static void PrintMetrics(RegressionMetrics metrics)
{
Console.WriteLine("Mean Absolute Error: " + metrics.MeanAbsoluteError);
Console.WriteLine("Mean Squared Error: " + metrics.MeanSquaredError);
Console.WriteLine("Root Mean Squared Error: " +
metrics.RootMeanSquaredError);
Console.WriteLine("RSquared: " + metrics.RSquared);
}
}
}
備註
矩陣分解的基本概念是尋找兩個低排名因數矩陣,以近似定型矩陣。
在本課程模組中,預期的定型資料是 Tuple 的清單。 每個 Tuple 都包含資料行索引、資料列索引,以及兩個索引所指定位置的值。