ML.NET gives you the ability to add machine learning to .NET applications, in either online or offline scenarios. With this capability, you can make automatic predictions using the data available to your application. Machine learning applications make use of patterns in the data to make predictions rather than needing to be explicitly programmed.
Central to ML.NET is a machine learning model. The model specifies the steps needed to transform your input data into a prediction. With ML.NET, you can train a custom model by specifying an algorithm, or you can import pretrained TensorFlow and Open Neural Network Exchange (ONNX) models.
Once you have a model, you can add it to your application to make the predictions.
ML.NET runs on Windows, Linux, and macOS using .NET, or on Windows using .NET Framework. 64 bit is supported on all platforms. 32 bit is supported on Windows, except for TensorFlow, LightGBM, and ONNX-related functionality.
The following table shows examples of the type of predictions that you can make with ML.NET.
Prediction type
Example
Classification/Categorization
Automatically divide customer feedback into positive and negative categories.
The code in the following snippet demonstrates the simplest ML.NET application. This example constructs a linear regression model to predict house prices using house size and price data.
An ML.NET model is an object that contains transformations to perform on your input data to arrive at the predicted output.
Basic
The most basic model is two-dimensional linear regression, where one continuous quantity is proportional to another, as in the house price example shown previously.
The model is simply: . The parameters and are estimated by fitting a line on a set of (size, price) pairs. The data used to find the parameters of the model is called training data. The inputs of a machine learning model are called features. In this example, is the only feature. The ground-truth values used to train a machine learning model are called labels. Here, the values in the training data set are the labels.
More complex
A more complex model classifies financial transactions into categories using the transaction text description.
Each transaction description is broken down into a set of features by removing redundant words and characters, and counting word and character combinations. The feature set is used to train a linear model based on the set of categories in the training data. The more similar a new description is to the ones in the training set, the more likely it will be assigned to the same category.
Both the house price model and the text classification model are linear models. Depending on the nature of your data and the problem you're solving, you can also use decision tree models, generalized additive models, and others. You can find out more about the models in Tasks.
Data preparation
In most cases, the data that you have available isn't suitable to be used directly to train a machine learning model. The raw data needs to be prepared, or preprocessed, before it can be used to find the parameters of your model. Your data might need to be converted from string values to a numerical representation. You might have redundant information in your input data. You might need to reduce or expand the dimensions of your input data. Your data might need to be normalized or scaled.
The ML.NET tutorials teach you about different data processing pipelines for text, image, numerical, and time-series data used for specific machine learning tasks.
Once you've trained your model, how do you know how well it will make future predictions? With ML.NET, you can evaluate your model against some new test data.
Each type of machine learning task has metrics used to evaluate the accuracy and precision of the model against the test data set.
The house price example shown earlier used the Regression task. To evaluate the model, add the following code to the original sample.
C#
HouseData[] testHouseData =
{
new HouseData() { Size = 1.1F, Price = 0.98F },
new HouseData() { Size = 1.9F, Price = 2.1F },
new HouseData() { Size = 2.8F, Price = 2.9F },
new HouseData() { Size = 3.4F, Price = 3.6F }
};
var testHouseDataView = mlContext.Data.LoadFromEnumerable(testHouseData);
var testPriceDataView = model.Transform(testHouseDataView);
var metrics = mlContext.Regression.Evaluate(testPriceDataView, labelColumnName: "Price");
Console.WriteLine($"R^2: {metrics.RSquared:0.##}");
Console.WriteLine($"RMS error: {metrics.RootMeanSquaredError:0.##}");
// R^2: 0.96// RMS error: 0.19
The evaluation metrics tell you that the error is low-ish, and that correlation between the predicted output and the test output is high. That was easy! In real examples, it takes more tuning to achieve good model metrics.
ML.NET architecture
This section describes the architectural patterns of ML.NET. If you're an experienced .NET developer, some of these patterns will be familiar to you, and some will be less familiar.
An ML.NET application starts with an MLContext object. This singleton object contains catalogs. A catalog is a factory for data loading and saving, transforms, trainers, and model operation components. Each catalog object has methods to create the different types of components.
In the snippet, Concatenate and Sdca are both methods in the catalog. They each create an IEstimator object that's appended to the pipeline.
At this point, the objects have been created, but no execution has happened.
Train the model
Once the objects in the pipeline have been created, data can be used to train the model.
C#
var model = pipeline.Fit(trainingData);
Calling Fit() uses the input training data to estimate the parameters of the model. This is known as training the model. Remember, the linear regression model shown earlier had two model parameters: bias and weight. After the Fit() call, the values of the parameters are known. (Most models will have many more parameters than this.)
You can transform input data into predictions in bulk, or one input at a time. The house price example did both: in bulk to evaluate the model, and one at a time to make a new prediction. Let's look at making single predictions.
C#
var size = new HouseData() { Size = 2.5F };
var predEngine = mlContext.CreatePredictionEngine<HouseData, Prediction>(model);
var price = predEngine.Predict(size);
The CreatePredictionEngine() method takes an input class and an output class. The field names or code attributes determine the names of the data columns used during model training and prediction. For more information, see Make predictions with a trained model.
Data models and schema
At the core of an ML.NET machine learning pipeline are DataView objects.
Each transformation in the pipeline has an input schema (data names, types, and sizes that the transform expects to see on its input); and an output schema (data names, types, and sizes that the transform produces after the transformation).
If the output schema from one transform in the pipeline doesn't match the input schema of the next transform, ML.NET will throw an exception.
A data view object has columns and rows. Each column has a name and a type and a length. For example, the input columns in the house price example are Size and Price. They are both type Single and they're scalar quantities rather than vector ones.
All ML.NET algorithms look for an input column that's a vector. By default, this vector column is called Features. That's why the house price example concatenated the Size column into a new column called Features.
C#
var pipeline = mlContext.Transforms.Concatenate("Features", new[] { "Size" })
All algorithms also create new columns after they've performed a prediction. The fixed names of these new columns depend on the type of machine learning algorithm. For the regression task, one of the new columns is called Score as shown in the price data attribute.
You can find out more about output columns of different machine learning tasks in the Machine Learning Tasks guide.
An important property of DataView objects is that they're evaluated lazily. Data views are only loaded and operated on during model training and evaluation, and data prediction. While you're writing and testing your ML.NET application, you can use the Visual Studio debugger to take a peek at any data view object by calling the Preview method.
C#
var debug = testPriceDataView.Preview();
You can watch the debug variable in the debugger and examine its contents.
Note
Don't use the Preview(IDataView, Int32) method in production code, as it significantly degrades performance.
Model deployment
In real-life applications, your model training and evaluation code will be separate from your prediction. In fact, these two activities are often performed by separate teams. Your model development team can save the model for use in the prediction application.
The source for this content can be found on GitHub, where you can also create and review issues and pull requests. For more information, see our contributor guide.
.NET
feedback
.NET
is an open source project. Select a link to provide feedback:
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.