Project X to Principal Components using ML.NET (PCA)

Hector Badenes 0 Reputation points
2023-11-20T13:42:12.7566667+00:00

I'm trying to use the ML.NET framework to apply conditionality reduction to a vector. I'm translating a function in Python that uses scikit-learn to do it.

I've seen in the documentation that I can use the Transform ProjectToPrincipalComponents to Fit() some data, which returns an object PrincipalComponentAnalysisTransformer which contains the function Transform() that I need to project the vector.

The issue is that I'm not training a set of data and then projecting, I already have the data trained and I think that I need to create the PrincipalComponentAnalysisTransformer object and assign it that data, but I don't know if that is possible.

The code that I've got so far:

MLContext mlContext = new MLContext();
IDataView dvSampleFrame = mlContext.Data.LoadFromTextFile<DoubleArraySampleFrame>(    "some_path", separatorChar: ',',hasHeader: true);
IDataView dvMean = mlContext.Data.LoadFromTextFile<DoubleArrayMean>(    "some_path", separatorChar: ',',hasHeader: true);
IDataView dvComponents = mlContext.Data.LoadFromTextFile<DoubleArrayComponents>(    "some_path", separatorChar: ',',hasHeader: true);

var pipeline = mlContext.Transforms.ProjectToPrincipalComponents("output", "PC_Columns");

// I think I need to create an instance of this object with the parameters dvComponents and dvMean.
PrincipalComponentAnalysisTransformer trans = null;

// I think this is the method I have to apply to dvSampleFrame to obtain the vector I need.
var output = trans.Transform(dvSampleFrame);

This is the original python code:

pca = PCA(n_components=len(pc_columns))
pca.components_ = np.transpose(    reference_frame.loc[:, pc_columns].to_numpy())
pca.mean_ = reference_frame.loc[:, 'pca_mean'].to_numpy()

transformed_pcs = pca.transform([list(sample_frame.iloc[:, 0])])

As input the python function has:

  • sample_frame: The vector I want to project.
  • reference_frame: a frame containing the precalculated trained values that are set as parameters of the PCA class, those parameters are:
    • components_: "Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by decreasing explained_variance_."
    • mean_: defined in scikit-learn as "Per-feature empirical mean, estimated from the training set."
    Is is possible to do what I'm trying with ML.NET? I am using the correct approach?
.NET Machine learning
.NET Machine learning
.NET: Microsoft Technologies based on the .NET software framework.Machine learning: A type of artificial intelligence focused on enabling computers to use observed data to evolve new behaviors that have not been explicitly programmed.
150 questions
0 comments No comments
{count} votes