Not able to re-train model [Multiclassification(AveragedPerceptron)]

stefan 1 Reputation point
2021-10-07T15:40:19.467+00:00

Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)

I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.

I am able to get the first model, but struggling to re-train it.

First time I have created the model (simulate all the steps generated by AutoML)

// First Phase: Create the model

        var mlContext = new MLContext(seed: 1);  


        // BuildTrainingPipeline  

        // Load Data  
        var data = mlContext.Data.LoadFromTextFile<ModelInput>(  
                                        path: TRAIN_DATA_FILEPATH,  
                                        hasHeader: false,  
                                        separatorChar: '\t',  
                                        allowQuoting: true,  
                                        allowSparse: false);  


        // Data process configuration with pipeline data transformations  
        var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("col0", "col0")  
                                  .Append(mlContext.Transforms.Text.FeaturizeText("col1_tf", "col1"))  
                                  .Append(mlContext.Transforms.CopyColumns("Features", "col1_tf"))  
                                  .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))  
                                  .AppendCacheCheckpoint(mlContext);  



        // Set the training algorithm   
        var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers  
                                 .AveragedPerceptron(labelColumnName: "col0", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "col0")  
                                  .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));  


        IEstimator<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);  



        // Train and save Model  


        // Create model here  
        ITransformer firstModel = trainingPipeline.Fit(data);  

        // Save the model  
        mlContext.Model.Save(firstModel, data.Schema, MODEL_FILEPATH);  

Then I presume I have new data to train the model with
/// Second Phase - Re-training the model

        // New Data  
        ModelInput[] ticketData = new ModelInput[]  
        {  

              new ModelInput  
              {  
                  Col0 = "Category 3",  
                  Col1 = "Text to classify 1"  
              },  

              new ModelInput  
              {  
                  Col0 = "Category 2",  
                  Col1 = "Text to classify 2"  
              },  

              new ModelInput  
              {  
                  Col0 = "Category 3",  
                  Col1 = "Text to classify 3"  
              },  

              new ModelInput  
              {  
                  Col0 = "Category 2",  
                  Col1 = "Text to classify 4"  
              },  

              new ModelInput  
              {  
                  Col0 = "Category 1",  
                  Col1 = "Text to classify 5"  
              },  

        };  



        // Create MLContext  
        MLContext mlContext = new MLContext();  

        // Define DataViewSchema  trained model  
        DataViewSchema modelSchema;  

        // Load trained model  
        var trainedModel = mlContext.Model.Load(MODEL_FILEPATH, out modelSchema);  

        //Load New Data  
        IDataView newData = mlContext.Data.LoadFromEnumerable<ModelInput>(ticketData);  
   


       // And here I get stuck. Because I don't know how to retrain the model with new data. I have tried to follow the guidance from this topics: [here](https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/retrain-model-ml-net), [here](https://github.com/dotnet/machinelearning/blob/36fab9b6806260e64e50992450a219e869c7f74a/test/Microsoft.ML.Functional.Tests/Training.cs#L80-L118) or changes suggested [here](https://github.com/dotnet/machinelearning/issues/5247) but with no result.  

Issue

My issues are due to multiclassification I think, because the trainer is of type EstimatorChain and my model is of type TransformerChain.
My trainer.Fit doesn't take 2 arguments.

.NET Machine learning
.NET Machine learning
.NET: Microsoft Technologies based on the .NET software framework.Machine learning: A type of artificial intelligence focused on enabling computers to use observed data to evolve new behaviors that have not been explicitly programmed.
150 questions
0 comments No comments
{count} votes