Machine Learning/Performance issue with Writing to CSV file - C#

deskcheck1-0579 411 Reputation points
2020-05-22T12:10:43.723+00:00

Hi,

I developed a Machine Learning (ML.NET) -- .NET Core console application (using Multiple classes prediction template). I've created the ML Model class and I'm now applying the ML Model to another .NET Core console application to predict the class/type of stream bed. This app is residing on my desktop, not Azure.

The ML app reads each row from the CSV input data and predicts the type of stream bed. As it makes the prediction row by row, I store each row prediction in a StringBuilder. When all rows have been read, I call the File.WriteAllText() function.

The Machine Learning console app is working fine, but now, my issue is how do I improve performance? When I use CSV input file for my app that consists of over 100K rows, the app runs very slowly (it writes the result to a CSV file at the rate of 1,000+ rows per hour!). All my data are over 100K rows each, and I need to process about 50 separate CSV files.

Is there a better way of doing this? Should I read/write each row first, instead of storing all the prediction rows to a StringBuilder before writing to CSV file?

Or, is there a faster way of writing the results to a CSV file?

Appreciate any advice.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,561 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,611 Reputation points
    2020-06-05T05:48:12.2+00:00

    Hi,

    Please follow the below sample from ML.NET Taxi Fare prediction that accepts total number of records to be read as input parameter and loop to predict on the same.

    https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/getting-started/Regression_TaxiFarePrediction/TaxiFarePrediction/TaxiFarePredictionConsoleApp/Program.cs#L192

    Thanks

    0 comments No comments