Događaj
Izgradite inteligentne aplikacije
17. mar 23 - 21. mar 23
Pridružite se seriji sastanaka kako biste izgradili skalabilna AI rešenja zasnovana na stvarnim slučajevima korišćenja sa kolegama programerima i stručnjacima.
Registrujte se odmahOvaj pregledač više nije podržan.
Nadogradite na Microsoft Edge biste iskoristili najnovije funkcije, bezbednosne ispravke i tehničku podršku.
Learn how to build machine learning models, collect metrics, and measure performance with ML.NET. Although this sample trains a regression model, the concepts are applicable throughout most of the other algorithms.
The goal of a machine learning model is to identify patterns within training data. These patterns are used to make predictions using new data.
The data can be modeled by a class like HousingData
.
public class HousingData
{
[LoadColumn(0)]
public float Size { get; set; }
[LoadColumn(1, 3)]
[VectorType(3)]
public float[] HistoricalPrices { get; set; }
[LoadColumn(4)]
[ColumnName("Label")]
public float CurrentPrice { get; set; }
}
Given the following data which is loaded into an IDataView
.
HousingData[] housingData = new HousingData[]
{
new HousingData
{
Size = 600f,
HistoricalPrices = new float[] { 100000f ,125000f ,122000f },
CurrentPrice = 170000f
},
new HousingData
{
Size = 1000f,
HistoricalPrices = new float[] { 200000f, 250000f, 230000f },
CurrentPrice = 225000f
},
new HousingData
{
Size = 1000f,
HistoricalPrices = new float[] { 126000f, 130000f, 200000f },
CurrentPrice = 195000f
},
new HousingData
{
Size = 850f,
HistoricalPrices = new float[] { 150000f,175000f,210000f },
CurrentPrice = 205000f
},
new HousingData
{
Size = 900f,
HistoricalPrices = new float[] { 155000f, 190000f, 220000f },
CurrentPrice = 210000f
},
new HousingData
{
Size = 550f,
HistoricalPrices = new float[] { 99000f, 98000f, 130000f },
CurrentPrice = 180000f
}
};
Use the TrainTestSplit
method to split the data into train and test sets. The result is a TrainTestData
object which contains two IDataView
members, one for the train set and the other for the test set. The data split percentage is determined by the testFraction
parameter. The following snippet holds out 20 percent of the original data for the test set.
DataOperationsCatalog.TrainTestData dataSplit = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
IDataView trainData = dataSplit.TrainSet;
IDataView testData = dataSplit.TestSet;
The data needs to be preprocessed before training a machine learning model. More information on data preparation can be found on the data prep how-to article and the transforms page
.
ML.NET algorithms have constraints on input column types. Additionally, default values are used for input and output column names when no values are specified.
The machine learning algorithms in ML.NET expect a float vector of known size as input. Apply the VectorType
attribute to your data model when all of the data is already in numerical format and is intended to be processed together (that is, image pixels).
If data isn't all numerical and you want to apply different data transformations on each of the columns individually, use the Concatenate
method after all of the columns have been processed to combine all of the individual columns into a single feature vector that is output to a new column.
The following snippet combines the Size
and HistoricalPrices
columns into a single feature vector that is output to a new column called Features
. Because there's a difference in scales, NormalizeMinMax
is applied to the Features
column to normalize the data.
// Define Data Prep Estimator
// 1. Concatenate Size and Historical into a single feature vector output to a new column called Features
// 2. Normalize Features vector
IEstimator<ITransformer> dataPrepEstimator =
mlContext.Transforms.Concatenate("Features", "Size", "HistoricalPrices")
.Append(mlContext.Transforms.NormalizeMinMax("Features"));
// Create data prep transformer
ITransformer dataPrepTransformer = dataPrepEstimator.Fit(trainData);
// Apply transforms to training data
IDataView transformedTrainingData = dataPrepTransformer.Transform(trainData);
ML.NET algorithms use default column names when none are specified. All trainers have a parameter called featureColumnName
for the inputs of the algorithm and when applicable they also have a parameter for the expected value called labelColumnName
. By default those values are Features
and Label
respectively.
By using the Concatenate
method during preprocessing to create a new column called Features
, there's no need to specify the feature column name in the parameters of the algorithm since it already exists in the preprocessed IDataView
. The label column is CurrentPrice
, but since the ColumnName
attribute is used in the data model, ML.NET renames the CurrentPrice
column to Label
which removes the need to provide the labelColumnName
parameter to the machine learning algorithm estimator.
If you don't want to use the default column names, pass in the names of the feature and label columns as parameters when defining the machine learning algorithm estimator as demonstrated by the subsequent snippet:
var UserDefinedColumnSdcaEstimator = mlContext.Regression.Trainers.Sdca(labelColumnName: "MyLabelColumnName", featureColumnName: "MyFeatureColumnName");
By default, when data is processed, it's lazily loaded or streamed, which means that trainers might load the data from disk and iterate over it multiple times during training. Therefore, caching is recommended for datasets that fit into memory to reduce the number of times data is loaded from disk. Caching is done as part of an EstimatorChain
by using AppendCacheCheckpoint
.
It's recommended to use AppendCacheCheckpoint
before any trainers in the pipeline.
Using the following EstimatorChain
, adding AppendCacheCheckpoint
before the StochasticDualCoordinateAscent
trainer caches the results of the previous estimators for later use by the trainer.
// 1. Concatenate Size and Historical into a single feature vector output to a new column called Features
// 2. Normalize Features vector
// 3. Cache prepared data
// 4. Use Sdca trainer to train the model
IEstimator<ITransformer> dataPrepEstimator =
mlContext.Transforms.Concatenate("Features", "Size", "HistoricalPrices")
.Append(mlContext.Transforms.NormalizeMinMax("Features"))
.AppendCacheCheckpoint(mlContext);
.Append(mlContext.Regression.Trainers.Sdca());
Once the data is preprocessed, use the Fit
method to train the machine learning model with the StochasticDualCoordinateAscent
regression algorithm.
// Define StochasticDualCoordinateAscent regression algorithm estimator
var sdcaEstimator = mlContext.Regression.Trainers.Sdca();
// Build machine learning model
var trainedModel = sdcaEstimator.Fit(transformedTrainingData);
After the model has been trained, extract the learned ModelParameters
for inspection or retraining. The LinearRegressionModelParameters
provide the bias and learned coefficients or weights of the trained model.
var trainedModelParameters = trainedModel.Model as LinearRegressionModelParameters;
Napomena
Other models have parameters that are specific to their tasks. For example, the K-Means algorithm puts data into cluster based on centroids and the KMeansModelParameters
contains a property that stores these learned centroids. To learn more, visit the Microsoft.ML.Trainers
API Documentation and look for classes that contain ModelParameters
in their name.
To help choose the best performing model, it's essential to evaluate its performance on test data. Use the Evaluate
method, to measure various metrics for the trained model.
Napomena
The Evaluate
method produces different metrics depending on which machine learning task was performed. For more details, visit the Microsoft.ML.Data
API Documentation and look for classes that contain Metrics
in their name.
// Measure trained model performance
// Apply data prep transformer to test data
IDataView transformedTestData = dataPrepTransformer.Transform(testData);
// Use trained model to make inferences on test data
IDataView testDataPredictions = trainedModel.Transform(transformedTestData);
// Extract model metrics and get RSquared
RegressionMetrics trainedModelMetrics = mlContext.Regression.Evaluate(testDataPredictions);
double rSquared = trainedModelMetrics.RSquared;
In the previous code sample:
Evaluate
method, the values in the CurrentPrice
column of the test data set are compared against the Score
column of the newly output predictions to calculate the metrics for the regression model, one of which, R-Squared is stored in the rSquared
variable.Napomena
In this small example, the R-Squared is a number not in the range of 0-1 because of the limited size of the data. In a real-world scenario, you should expect to see a value between 0 and 1.
Povratne informacije za .NET
.NET je projekat otvorenog koda. Izaberite vezu da biste pružili povratne informacije:
Događaj
Izgradite inteligentne aplikacije
17. mar 23 - 21. mar 23
Pridružite se seriji sastanaka kako biste izgradili skalabilna AI rešenja zasnovana na stvarnim slučajevima korišćenja sa kolegama programerima i stručnjacima.
Registrujte se odmahObuka
Modul
Train and understand regression models in machine learning - Training
Regression is arguably the most widely used machine learning technique, commonly underlying scientific discoveries, business planning, and stock market analytics. This learning material takes a dive into some common regression analyses, both simple and more complex, and provides some insight on how to assess model performance.
Certifikacija
Microsoft Certified: Azure Data Scientist Associate - Certifications
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.
Dokumentacija
Prepare data for building a model - ML.NET
Learn how to use transforms in ML.NET to manipulate and prepare data for additional processing or model building.
Load data from files and other sources - ML.NET
Learn how to load data for processing and training into ML.NET using the API. Data is stored in files, databases, JSON, XML or in-memory collections.
How to use the ML.NET Automated ML (AutoML) API - ML.NET
The ML.NET Automated ML (AutoML) API automates the model building process and generates a model ready for deployment. Learn the options that you can use to configure automated machine learning tasks.