PcaCatalog.RandomizedPca Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, RandomizedPcaTrainer+Options) |
Create RandomizedPcaTrainer with advanced options, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm. |
RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, String, String, Int32, Int32, Boolean, Nullable<Int32>) |
Create RandomizedPcaTrainer, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm. |
RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, RandomizedPcaTrainer+Options)
Create RandomizedPcaTrainer with advanced options, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm.
public static Microsoft.ML.Trainers.RandomizedPcaTrainer RandomizedPca (this Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog, Microsoft.ML.Trainers.RandomizedPcaTrainer.Options options);
static member RandomizedPca : Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers * Microsoft.ML.Trainers.RandomizedPcaTrainer.Options -> Microsoft.ML.Trainers.RandomizedPcaTrainer
<Extension()>
Public Function RandomizedPca (catalog As AnomalyDetectionCatalog.AnomalyDetectionTrainers, options As RandomizedPcaTrainer.Options) As RandomizedPcaTrainer
Parameters
The anomaly detection catalog trainer object.
- options
- RandomizedPcaTrainer.Options
Advanced options to the algorithm.
Returns
Examples
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
namespace Samples.Dynamic.Trainers.AnomalyDetection
{
public static class RandomizedPcaSampleWithOptions
{
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for
// exception tracking and logging, as a catalog of available operations
// and as the source of randomness. Setting the seed to a fixed number
// in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);
// Training data.
var samples = new List<DataPoint>()
{
new DataPoint(){ Features = new float[3] {0, 2, 1} },
new DataPoint(){ Features = new float[3] {0, 2, 3} },
new DataPoint(){ Features = new float[3] {0, 2, 4} },
new DataPoint(){ Features = new float[3] {0, 2, 1} },
new DataPoint(){ Features = new float[3] {0, 2, 2} },
new DataPoint(){ Features = new float[3] {0, 2, 3} },
new DataPoint(){ Features = new float[3] {0, 2, 4} },
new DataPoint(){ Features = new float[3] {1, 0, 0} }
};
// Convert the List<DataPoint> to IDataView, a consumable format to
// ML.NET functions.
var data = mlContext.Data.LoadFromEnumerable(samples);
var options = new Microsoft.ML.Trainers.RandomizedPcaTrainer.Options()
{
FeatureColumnName = nameof(DataPoint.Features),
Rank = 1,
Seed = 10,
};
// Create an anomaly detector. Its underlying algorithm is randomized
// PCA.
var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(
options);
// Train the anomaly detector.
var model = pipeline.Fit(data);
// Apply the trained model on the training data.
var transformed = model.Transform(data);
// Read ML.NET predictions into IEnumerable<Result>.
var results = mlContext.Data.CreateEnumerable<Result>(transformed,
reuseRowObject: false).ToList();
// Let's go through all predictions.
for (int i = 0; i < samples.Count; ++i)
{
// The i-th example's prediction result.
var result = results[i];
// The i-th example's feature vector in text format.
var featuresInText = string.Join(',', samples[i].Features);
if (result.PredictedLabel)
// The i-th sample is predicted as an outlier.
Console.WriteLine("The {0}-th example with features [{1}] is" +
"an outlier with a score of being outlier {2}", i,
featuresInText, result.Score);
else
// The i-th sample is predicted as an inlier.
Console.WriteLine("The {0}-th example with features [{1}] is" +
"an inlier with a score of being outlier {2}",
i, featuresInText, result.Score);
}
// Lines printed out should be
// The 0 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.2264826
// The 1 - th example with features[0, 2, 3] is an inlier with a score of being outlier 0.1739471
// The 2 - th example with features[0, 2, 4] is an inlier with a score of being outlier 0.05711612
// The 3 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.2264826
// The 4 - th example with features[0, 2, 2] is an inlier with a score of being outlier 0.3868995
// The 5 - th example with features[0, 2, 3] is an inlier with a score of being outlier 0.1739471
// The 6 - th example with features[0, 2, 4] is an inlier with a score of being outlier 0.05711612
// The 7 - th example with features[1, 0, 0] is an outlier with a score of being outlier 0.6260795
}
// Example with 3 feature values. A training data set is a collection of
// such examples.
private class DataPoint
{
[VectorType(3)]
public float[] Features { get; set; }
}
// Class used to capture prediction of DataPoint.
private class Result
{
// Outlier gets true while inlier has false.
public bool PredictedLabel { get; set; }
// Inlier gets smaller score. Score is between 0 and 1.
public float Score { get; set; }
}
}
}
Remarks
By default the threshold used to determine the label of a data point based on the predicted score is 0.5. Scores range from 0 to 1. A data point with predicted score higher than 0.5 is considered an outlier. Use ChangeModelThreshold<TModel>(AnomalyPredictionTransformer<TModel>, Single) to change this threshold.
Applies to
RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, String, String, Int32, Int32, Boolean, Nullable<Int32>)
Create RandomizedPcaTrainer, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm.
public static Microsoft.ML.Trainers.RandomizedPcaTrainer RandomizedPca (this Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog, string featureColumnName = "Features", string exampleWeightColumnName = default, int rank = 20, int oversampling = 20, bool ensureZeroMean = true, int? seed = default);
static member RandomizedPca : Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers * string * string * int * int * bool * Nullable<int> -> Microsoft.ML.Trainers.RandomizedPcaTrainer
<Extension()>
Public Function RandomizedPca (catalog As AnomalyDetectionCatalog.AnomalyDetectionTrainers, Optional featureColumnName As String = "Features", Optional exampleWeightColumnName As String = Nothing, Optional rank As Integer = 20, Optional oversampling As Integer = 20, Optional ensureZeroMean As Boolean = true, Optional seed As Nullable(Of Integer) = Nothing) As RandomizedPcaTrainer
Parameters
The anomaly detection catalog trainer object.
- featureColumnName
- String
The name of the feature column. The column data must be a known-sized vector of Single.
- exampleWeightColumnName
- String
The name of the example weight column (optional). To use the weight column, the column data must be of type Single.
- rank
- Int32
The number of components in the PCA.
- oversampling
- Int32
Oversampling parameter for randomized PCA training.
- ensureZeroMean
- Boolean
If enabled, data is centered to be zero mean.
Returns
Examples
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
namespace Samples.Dynamic.Trainers.AnomalyDetection
{
public static class RandomizedPcaSample
{
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for except
// ion tracking and logging, as a catalog of available operations and as
// the source of randomness. Setting the seed to a fixed number in this
// example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);
// Training data.
var samples = new List<DataPoint>()
{
new DataPoint(){ Features = new float[3] {0, 2, 1} },
new DataPoint(){ Features = new float[3] {0, 2, 1} },
new DataPoint(){ Features = new float[3] {0, 2, 1} },
new DataPoint(){ Features = new float[3] {0, 1, 2} },
new DataPoint(){ Features = new float[3] {0, 2, 1} },
new DataPoint(){ Features = new float[3] {2, 0, 0} }
};
// Convert the List<DataPoint> to IDataView, a consumable format to
// ML.NET functions.
var data = mlContext.Data.LoadFromEnumerable(samples);
// Create an anomaly detector. Its underlying algorithm is randomized
// PCA.
var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(
featureColumnName: nameof(DataPoint.Features), rank: 1,
ensureZeroMean: false);
// Train the anomaly detector.
var model = pipeline.Fit(data);
// Apply the trained model on the training data.
var transformed = model.Transform(data);
// Read ML.NET predictions into IEnumerable<Result>.
var results = mlContext.Data.CreateEnumerable<Result>(transformed,
reuseRowObject: false).ToList();
// Let's go through all predictions.
for (int i = 0; i < samples.Count; ++i)
{
// The i-th example's prediction result.
var result = results[i];
// The i-th example's feature vector in text format.
var featuresInText = string.Join(',', samples[i].Features);
if (result.PredictedLabel)
// The i-th sample is predicted as an outlier.
Console.WriteLine("The {0}-th example with features [{1}] is " +
"an outlier with a score of being inlier {2}", i,
featuresInText, result.Score);
else
// The i-th sample is predicted as an inlier.
Console.WriteLine("The {0}-th example with features [{1}] is " +
"an inlier with a score of being inlier {2}", i,
featuresInText, result.Score);
}
// Lines printed out should be
// The 0 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
// The 1 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
// The 2 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
// The 3 - th example with features[0, 1, 2] is an outlier with a score of being outlier 0.5082728
// The 4 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
// The 5 - th example with features[2, 0, 0] is an outlier with a score of being outlier 1
}
// Example with 3 feature values. A training data set is a collection of
// such examples.
private class DataPoint
{
[VectorType(3)]
public float[] Features { get; set; }
}
// Class used to capture prediction of DataPoint.
private class Result
{
// Outlier gets true while inlier has false.
public bool PredictedLabel { get; set; }
// Inlier gets smaller score. Score is between 0 and 1.
public float Score { get; set; }
}
}
}
Remarks
By default the threshold used to determine the label of a data point based on the predicted score is 0.5. Scores range from 0 to 1. A data point with predicted score higher than 0.5 is considered an outlier. Use ChangeModelThreshold<TModel>(AnomalyPredictionTransformer<TModel>, Single) to change this threshold.