System.ArgumentOutOfRangeException: 'Could not find column 'EncodedTaskStatuses' (Parameter 'Schema')'

Matias 0 Reputation points
2023-02-01T15:07:20.9233333+00:00

Hello everyone, I´m trying to build a multiclass classification model using ML.Net.
I´m trying to predict the job status that can be 11 different values, the job status depend on these features:
bool IsPartiallyBilled
int[] TaskStatuses
int[] TaskResourceStatuses

So the input data class is this:

    public class JobData
    {
        public bool IsPartiallyBilled { get; set; }
        [VectorType]
        public int[] TaskStatuses { get; set; }
        [VectorType]
        public int[] TaskResourceStatuses { get; set; }
        public int JobStatus { get; set; }
    }

Here are the additional classes for the custom transformation:

    public class TransformedOutput
    {
        public float[] FinalEncoding { get; set; }
        public int Label { get; set; }
    }

    class TransformedInput
    {
        [VectorType]
        public float[] EncodedTaskStatuses { get; set; }
        [VectorType]
        public float[] EncodedTaskResourceStatuses { get; set; }

    }

And this is the PredictedClass:

    public class JobPrediction
    {
        [ColumnName("Score")]
        public float JobStatus { get; set; }
    }

And finally this is the rest of the code:

var mlContext = new MLContext();

// Load the data, LoadData method returns List<JobData>
var data = mlContext.Data.LoadFromEnumerable(LoadData());

// Create pipeline
var dataPrep = mlContext.Transforms.Categorical.OneHotEncoding("EncodedTaskStatuses", nameof(JobData.TaskStatuses))
	.Append(mlContext.Transforms.Categorical.OneHotEncoding("EncodedTaskResourceStatuses", nameof(JobData.TaskResourceStatuses)));

IDataView transformedData = dataPrep.Fit(data).Transform(data);

var encodedTaskStatusesVectorType = transformedData.Schema["EncodedTaskStatuses"].Type as VectorDataViewType;
var encodedTaskResourcesStatusesVectorType = transformedData.Schema["EncodedTaskResourceStatuses"].Type as VectorDataViewType;

var encodedTaskStatusesVectorDimensions = encodedTaskStatusesVectorType.Dimensions[1];
var encodedTaskResourcesStatusesVectorDimensions = encodedTaskResourcesStatusesVectorType.Dimensions[1];

//I guess I need to do a custom transformation since I need to have all features in one single vector, maybe I´m wrong or maybe the way I´m doing the customTransformation is wrong
Action<TransformedInput, TransformedOutput> customTransform = (rowIn, rowOut) =>
{
	float[] unifiedEncoding = new float[encodedTaskStatusesVectorDimensions + encodedTaskResourcesStatusesVectorDimensions];

	var taskStatusesIndices = rowIn.EncodedTaskStatuses
	.Select((x, i) => new { x, i })
	.Where(x => x.x == 1)
	.Select(x => x.i);

	var taskResourceStatusesIndices = rowIn.EncodedTaskResourceStatuses
	.Select((x, i) => new { x, i })
	.Where(x => x.x == 1)
	.Select(x => x.i);

	foreach (var idx in taskStatusesIndices)
	{
		var mappedIdx = idx % encodedTaskStatusesVectorDimensions;
		unifiedEncoding[mappedIdx] = 1;
	}

	foreach (var idx in taskResourceStatusesIndices)
	{
		var mappedIdx = idx % encodedTaskResourcesStatusesVectorDimensions;
		unifiedEncoding[mappedIdx] = 1;
	}

	rowOut.FinalEncoding = unifiedEncoding;
};

var outputSchemaDefinition = SchemaDefinition.Create(typeof(TransformedOutput));
outputSchemaDefinition["FinalEncoding"].ColumnType = new VectorDataViewType(NumberDataViewType.Single, encodedTaskStatusesVectorDimensions + encodedTaskResourcesStatusesVectorDimensions);


var trainingPipeline = mlContext.Transforms.CustomMapping(customTransform, null, outputSchemaDefinition: outputSchemaDefinition)
	.Append(mlContext.Transforms.Conversion.MapValueToKey("Label","JobStatus"))
	.Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(featureColumnName: "FinalEncoding"));

var model = trainingPipeline.Fit(transformedData);

//Use the model to predict job status for a new data point
var predictionEngine = mlContext.Model.CreatePredictionEngine<JobData, JobPrediction>(model);
var result = predictionEngine.Predict(new JobData()
{
   IsPartiallyBilled = true,
   TaskResourceStatuses = new int[] { 4,4,4},
   TaskStatuses = new int[] {4,4,4 },
   JobStatus = 4
});

But when I create the prediction engine I get this error:

System.ArgumentOutOfRangeException: 'Could not find column 'EncodedTaskStatuses' (Parameter 'Schema')'

I think that I´m missing to map EncodedTaskStatuses and EncodedTaskResourceStatuses back to it´s original names TaskStatuses and TaskResourceStatuses? If so, i´m not quite sure how to do that, and if not I´m kind of lost.

If someone could give me a hand I would appreciate it.
Thanks in advance!

.NET Machine learning
.NET Machine learning
.NET: Microsoft Technologies based on the .NET software framework.Machine learning: A type of artificial intelligence focused on enabling computers to use observed data to evolve new behaviors that have not been explicitly programmed.
150 questions
0 comments No comments
{count} votes