RandomizedPcaTrainer Class

Reference

Definition

Namespace:: Microsoft.ML.Trainers

Assembly:: Microsoft.ML.PCA.dll

Package:: Microsoft.ML v3.0.1

Package:: Microsoft.ML v1.0.0

Package:: Microsoft.ML v1.1.0

Package:: Microsoft.ML v1.2.0

Package:: Microsoft.ML v1.3.1

Package:: Microsoft.ML v1.4.0

Package:: Microsoft.ML v1.5.5

Package:: Microsoft.ML v1.6.0

Package:: Microsoft.ML v1.7.0

Package:: Microsoft.ML v2.0.0

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

The IEstimator<TTransformer> for training an approximate PCA using Randomized SVD algorithm.

public sealed class RandomizedPcaTrainer : Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.AnomalyPredictionTransformer<Microsoft.ML.Trainers.PcaModelParameters>,Microsoft.ML.Trainers.PcaModelParameters>

type RandomizedPcaTrainer = class
    inherit TrainerEstimatorBase<AnomalyPredictionTransformer<PcaModelParameters>, PcaModelParameters>

Public NotInheritable Class RandomizedPcaTrainer
Inherits TrainerEstimatorBase(Of AnomalyPredictionTransformer(Of PcaModelParameters), PcaModelParameters)

Inheritance: Object

TrainerEstimatorBase<AnomalyPredictionTransformer<PcaModelParameters>,PcaModelParameters>
RandomizedPcaTrainer

Remarks

To create this trainer, use RandomizedPca or RandomizedPca(Options).

Input and Output Columns

The input features column data must be a known-sized vector of Single. This trainer outputs the following columns:

Output Column Name	Column Type	Description
`Score`	Single	The non-negative, unbounded score that was calculated by the anomaly detection model.
`PredictedLabel`	Boolean	The predicted label, based on the threshold. A score higher than the threshold maps to `true` and a score lower than the threshold maps to `false`. The default threshold is `0.5`.Use <xref:AnomalyDetectionCatalog.ChangeModelThreshold> to change the default value.

Trainer Characteristics


Machine learning task	Anomaly Detection
Is normalization required?	Yes
Is caching required?	No
Required NuGet in addition to Microsoft.ML	None
Exportable to ONNX	No

Training Algorithm Details

This trainer uses the top eigenvectors to approximate the subspace containing the normal class. For each new instance, it computes the norm difference between the raw feature vector and the projected feature on that subspace. If the error is close to 0, the instance is considered normal (non-anomaly).

More specifically, this trainer trains an approximate PCA using a randomized method for computing the singular value decomposition (SVD) of the matrix whose rows are the input vectors. The model generated by this trainer contains three parameters:

A projection matrix $U$
The mean vector in the original feature space $m$
The mean vector in the projected feature space $p$

For an input feature vector $x$, the anomaly score is computed by comparing the $L_2$ norm of the original input vector, and the $L_2$ norm of the projected vector: $\sqrt{\left(|x-m|_2^2 - |Ux-p|_2^2\right)|x-m|_2^2}$.

The method is described here.

Note that the algorithm can be made into Kernel PCA by applying the ApproximatedKernelTransformer to the data before passing it to the trainer.

Check the See Also section for links to usage examples.

Fields

FeatureColumn	The feature column that the trainer expects. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)
LabelColumn	The label column that the trainer expects. Can be `null`, which indicates that label is not used for training. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)
WeightColumn	The weight column that the trainer expects. Can be `null`, which indicates that weight is not used for training. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)

Properties

Info

Methods

Fit(IDataView)	Trains and returns a ITransformer. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)
GetOutputSchema(SchemaShape)	(Inherited from TrainerEstimatorBase<TTransformer,TModel>)

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

Dela via