Project X to Principal Components using ML.NET (PCA)
I'm trying to use the ML.NET framework to apply conditionality reduction to a vector. I'm translating a function in Python that uses scikit-learn to do it.
I've seen in the documentation that I can use the Transform ProjectToPrincipalComponents to Fit() some data, which returns an object PrincipalComponentAnalysisTransformer which contains the function Transform() that I need to project the vector.
The issue is that I'm not training a set of data and then projecting, I already have the data trained and I think that I need to create the PrincipalComponentAnalysisTransformer object and assign it that data, but I don't know if that is possible.
The code that I've got so far:
MLContext mlContext = new MLContext();
IDataView dvSampleFrame = mlContext.Data.LoadFromTextFile<DoubleArraySampleFrame>( "some_path", separatorChar: ',',hasHeader: true);
IDataView dvMean = mlContext.Data.LoadFromTextFile<DoubleArrayMean>( "some_path", separatorChar: ',',hasHeader: true);
IDataView dvComponents = mlContext.Data.LoadFromTextFile<DoubleArrayComponents>( "some_path", separatorChar: ',',hasHeader: true);
var pipeline = mlContext.Transforms.ProjectToPrincipalComponents("output", "PC_Columns");
// I think I need to create an instance of this object with the parameters dvComponents and dvMean.
PrincipalComponentAnalysisTransformer trans = null;
// I think this is the method I have to apply to dvSampleFrame to obtain the vector I need.
var output = trans.Transform(dvSampleFrame);
This is the original python code:
pca = PCA(n_components=len(pc_columns))
pca.components_ = np.transpose( reference_frame.loc[:, pc_columns].to_numpy())
pca.mean_ = reference_frame.loc[:, 'pca_mean'].to_numpy()
transformed_pcs = pca.transform([list(sample_frame.iloc[:, 0])])
As input the python function has:
- sample_frame: The vector I want to project.
- reference_frame: a frame containing the precalculated trained values that are set as parameters of the PCA class, those parameters are:
- components_: "Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by decreasing
explained_variance_
." - mean_: defined in scikit-learn as "Per-feature empirical mean, estimated from the training set."
- components_: "Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by decreasing