System.ArgumentOutOfRangeException: 'Could not find column 'EncodedTaskStatuses' (Parameter 'Schema')'
Hello everyone, I´m trying to build a multiclass classification model using ML.Net.
I´m trying to predict the job status that can be 11 different values, the job status depend on these features:
bool IsPartiallyBilled
int[] TaskStatuses
int[] TaskResourceStatuses
So the input data class is this:
public class JobData
{
public bool IsPartiallyBilled { get; set; }
[VectorType]
public int[] TaskStatuses { get; set; }
[VectorType]
public int[] TaskResourceStatuses { get; set; }
public int JobStatus { get; set; }
}
Here are the additional classes for the custom transformation:
public class TransformedOutput
{
public float[] FinalEncoding { get; set; }
public int Label { get; set; }
}
class TransformedInput
{
[VectorType]
public float[] EncodedTaskStatuses { get; set; }
[VectorType]
public float[] EncodedTaskResourceStatuses { get; set; }
}
And this is the PredictedClass:
public class JobPrediction
{
[ColumnName("Score")]
public float JobStatus { get; set; }
}
And finally this is the rest of the code:
var mlContext = new MLContext();
// Load the data, LoadData method returns List<JobData>
var data = mlContext.Data.LoadFromEnumerable(LoadData());
// Create pipeline
var dataPrep = mlContext.Transforms.Categorical.OneHotEncoding("EncodedTaskStatuses", nameof(JobData.TaskStatuses))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("EncodedTaskResourceStatuses", nameof(JobData.TaskResourceStatuses)));
IDataView transformedData = dataPrep.Fit(data).Transform(data);
var encodedTaskStatusesVectorType = transformedData.Schema["EncodedTaskStatuses"].Type as VectorDataViewType;
var encodedTaskResourcesStatusesVectorType = transformedData.Schema["EncodedTaskResourceStatuses"].Type as VectorDataViewType;
var encodedTaskStatusesVectorDimensions = encodedTaskStatusesVectorType.Dimensions[1];
var encodedTaskResourcesStatusesVectorDimensions = encodedTaskResourcesStatusesVectorType.Dimensions[1];
//I guess I need to do a custom transformation since I need to have all features in one single vector, maybe I´m wrong or maybe the way I´m doing the customTransformation is wrong
Action<TransformedInput, TransformedOutput> customTransform = (rowIn, rowOut) =>
{
float[] unifiedEncoding = new float[encodedTaskStatusesVectorDimensions + encodedTaskResourcesStatusesVectorDimensions];
var taskStatusesIndices = rowIn.EncodedTaskStatuses
.Select((x, i) => new { x, i })
.Where(x => x.x == 1)
.Select(x => x.i);
var taskResourceStatusesIndices = rowIn.EncodedTaskResourceStatuses
.Select((x, i) => new { x, i })
.Where(x => x.x == 1)
.Select(x => x.i);
foreach (var idx in taskStatusesIndices)
{
var mappedIdx = idx % encodedTaskStatusesVectorDimensions;
unifiedEncoding[mappedIdx] = 1;
}
foreach (var idx in taskResourceStatusesIndices)
{
var mappedIdx = idx % encodedTaskResourcesStatusesVectorDimensions;
unifiedEncoding[mappedIdx] = 1;
}
rowOut.FinalEncoding = unifiedEncoding;
};
var outputSchemaDefinition = SchemaDefinition.Create(typeof(TransformedOutput));
outputSchemaDefinition["FinalEncoding"].ColumnType = new VectorDataViewType(NumberDataViewType.Single, encodedTaskStatusesVectorDimensions + encodedTaskResourcesStatusesVectorDimensions);
var trainingPipeline = mlContext.Transforms.CustomMapping(customTransform, null, outputSchemaDefinition: outputSchemaDefinition)
.Append(mlContext.Transforms.Conversion.MapValueToKey("Label","JobStatus"))
.Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(featureColumnName: "FinalEncoding"));
var model = trainingPipeline.Fit(transformedData);
//Use the model to predict job status for a new data point
var predictionEngine = mlContext.Model.CreatePredictionEngine<JobData, JobPrediction>(model);
var result = predictionEngine.Predict(new JobData()
{
IsPartiallyBilled = true,
TaskResourceStatuses = new int[] { 4,4,4},
TaskStatuses = new int[] {4,4,4 },
JobStatus = 4
});
But when I create the prediction engine I get this error:
System.ArgumentOutOfRangeException: 'Could not find column 'EncodedTaskStatuses' (Parameter 'Schema')'
I think that I´m missing to map EncodedTaskStatuses and EncodedTaskResourceStatuses back to it´s original names TaskStatuses and TaskResourceStatuses? If so, i´m not quite sure how to do that, and if not I´m kind of lost.
If someone could give me a hand I would appreciate it.
Thanks in advance!