Interpret model predictions using Permutation Feature Importance

Raksts
19.12.2024

Using Permutation Feature Importance (PFI), learn how to interpret ML.NET machine learning model predictions. PFI gives the relative contribution each feature makes to a prediction.

Machine learning models are often thought of as opaque boxes that take inputs and generate an output. The intermediate steps or interactions among the features that influence the output are rarely understood. As machine learning is introduced into more aspects of everyday life, such as healthcare, it's of utmost importance to understand why a machine learning model makes the decisions it does. For example, if diagnoses are made by a machine learning model, healthcare professionals need a way to look into the factors that went into making that diagnosis. Providing the right diagnosis could make a great difference on whether a patient has a speedy recovery or not. Therefore the higher the level of explainability in a model, the greater confidence healthcare professionals have to accept or reject the decisions made by the model.

Various techniques are used to explain models, one of which is PFI. PFI is a technique used to explain classification and regression models that's inspired by Breiman's Random Forests paper (see section 10). At a high level, the way it works is by randomly shuffling data one feature at a time for the entire dataset and calculating how much the performance metric of interest decreases. The larger the change, the more important that feature is.

Additionally, by highlighting the most important features, model builders can focus on using a subset of more meaningful features, which can potentially reduce noise and training time.

Load the data

The features in the dataset used for this sample are in columns 1-12. The goal is to predict Price.

Column	Feature	Description
1	CrimeRate	Per capita crime rate
2	ResidentialZones	Residential zones in town
3	CommercialZones	Nonresidential zones in town
4	NearWater	Proximity to body of water
5	ToxicWasteLevels	Toxicity levels (PPM)
6	AverageRoomNumber	Average number of rooms in house
7	HomeAge	Age of home
8	BusinessCenterDistance	Distance to nearest business district
9	HighwayAccess	Proximity to highways
10	TaxRate	Property tax rate
11	StudentTeacherRatio	Ratio of students to teachers
12	PercentPopulationBelowPoverty	Percent of population living below poverty
13	Price	Price of the home

A sample of the dataset is shown here:

text

1,24,13,1,0.59,3,96,11,23,608,14,13,32
4,80,18,1,0.37,5,14,7,4,346,19,13,41
2,98,16,1,0.25,10,5,1,8,689,13,36,12

The data in this sample can be modeled by a class like HousingPriceData and loaded into an IDataView.

class HousingPriceData
{
    [LoadColumn(0)]
    public float CrimeRate { get; set; }

    [LoadColumn(1)]
    public float ResidentialZones { get; set; }

    [LoadColumn(2)]
    public float CommercialZones { get; set; }

    [LoadColumn(3)]
    public float NearWater { get; set; }

    [LoadColumn(4)]
    public float ToxicWasteLevels { get; set; }

    [LoadColumn(5)]
    public float AverageRoomNumber { get; set; }

    [LoadColumn(6)]
    public float HomeAge { get; set; }

    [LoadColumn(7)]
    public float BusinessCenterDistance { get; set; }

    [LoadColumn(8)]
    public float HighwayAccess { get; set; }

    [LoadColumn(9)]
    public float TaxRate { get; set; }

    [LoadColumn(10)]
    public float StudentTeacherRatio { get; set; }

    [LoadColumn(11)]
    public float PercentPopulationBelowPoverty { get; set; }

    [LoadColumn(12)]
    [ColumnName("Label")]
    public float Price { get; set; }
}

Train the model

The following code sample illustrates the process of training a linear regression model to predict house prices.

// 1. Get the column name of input features.
string[] featureColumnNames =
    data.Schema
        .Select(column => column.Name)
        .Where(columnName => columnName != "Label").ToArray();

// 2. Define training pipeline.
IEstimator<ITransformer> sdcaEstimator =
    mlContext.Transforms.Concatenate("Features", featureColumnNames)
        .Append(mlContext.Transforms.NormalizeMinMax("Features"))
        .Append(mlContext.Regression.Trainers.Sdca());

// 3. Train machine learning model.
var sdcaModel = sdcaEstimator.Fit(data);

Explain the model with Permutation Feature Importance (PFI)

In ML.NET, use the PermutationFeatureImportance method for your respective task.

// Use the model to make predictions
var transformedData = sdcaModel.Transform(data);

// Calculate feature importance
ImmutableArray<RegressionMetricsStatistics> permutationFeatureImportance =
    mlContext
        .Regression
        .PermutationFeatureImportance(sdcaModel, transformedData, permutationCount:3);

The result of using PermutationFeatureImportance on the training dataset is an ImmutableArray of RegressionMetricsStatistics objects. RegressionMetricsStatistics provides summary statistics like mean and standard deviation for multiple observations of RegressionMetrics equal to the number of permutations specified by the permutationCount parameter.

The metric used to measure feature importance depends on the machine learning task used to solve your problem. For example, regression tasks might use a common evaluation metric such as R-squared to measure importance. For more information on model evaluation metrics, see evaluate your ML.NET model with metrics.

The importance, or in this case, the absolute average decrease in R-squared metric, calculated by PermutationFeatureImportance, can then be ordered from most important to least important.

// Order features by importance.
var featureImportanceMetrics =
    permutationFeatureImportance
        .Select((metric, index) => new { index, metric.RSquared })
        .OrderByDescending(myFeatures => Math.Abs(myFeatures.RSquared.Mean));

Console.WriteLine("Feature\tPFI");

foreach (var feature in featureImportanceMetrics)
{
    Console.WriteLine($"{featureColumnNames[feature.index],-20}|\t{feature.RSquared.Mean:F6}");
}

Printing the values for each of the features in featureImportanceMetrics generates output similar to the output that follows. You should expect to see different results because these values vary based on the data that they're given.

Feature	Change to R-Squared
HighwayAccess	-0.042731
StudentTeacherRatio	-0.012730
BusinessCenterDistance	-0.010491
TaxRate	-0.008545
AverageRoomNumber	-0.003949
CrimeRate	-0.003665
CommercialZones	0.002749
HomeAge	-0.002426
ResidentialZones	-0.002319
NearWater	0.000203
PercentPopulationLivingBelowPoverty	0.000031
ToxicWasteLevels	-0.000019

If you look at the five most important features for this dataset, the price of a house predicted by this model is influenced by its proximity to highways, student teacher ratio of schools in the area, proximity to major employment centers, property tax rate, and average number of rooms in the home.

Next steps

Sadarbojieties ar mums vietnē GitHub

Šī satura avotu var atrast vietnē GitHub, kur varat arī izveidot un pārskatīt problēmas un atgādāšanas pieprasījumus. Lai iegūtu papildinformāciju, skatiet mūsu līdzstrādnieku rokasgrāmatu.

.NET atsauksmes

.NET ir atklātā pirmkoda projekts. Atlasiet saiti, lai sniegtu atsauksmes:

Atvērtās dokumentācijas jautājums Sniegt atsauksmes par produktu

Papildu resursi

Dokumentācija

Train a machine learning model using cross validation - ML.NET

Learn how to use cross validation to build more robust machine learning models in ML.NET. Cross-validation is a training and model evaluation technique that splits the data into several partitions and trains multiple algorithms on these partitions.
Make predictions with a trained model - ML.NET

Learn to make predictions using a trained ML.NET model
Train and evaluate a model - ML.NET

Learn how to build machine learning models, collect metrics, and measure performance with ML.NET. A machine learning model identifies patterns within training data to make predictions using new data.
Inspect intermediate data during ML.NET processing - ML.NET

Learn how to inspect intermediate data during ML.NET machine learning pipeline loading, processing, and model training steps in ML.NET.
How to use the ML.NET Automated ML (AutoML) API - ML.NET

The ML.NET Automated ML (AutoML) API automates the model building process and generates a model ready for deployment. Learn the options that you can use to configure automated machine learning tasks.
Save and load trained models - ML.NET

Learn how to save and load trained models
Retrain a model - ML.NET

Learn how to retrain a model in ML.NET
Prepare data for building a model - ML.NET

Learn how to use transforms in ML.NET to manipulate and prepare data for additional processing or model building.

Apmācība

Modulis

Refine and test machine learning models - Training

When we think of machine learning, we often focus on the training process. A small amount of preparation before this process can not only speed up and improve learning, but also give us some confidence about how well our models will work when faced with data we have never seen before.

Notikumi

Veidojiet inteliģentas lietotnes

17. marts 21 - 21. marts 10

Pievienojieties meetup sērijai, lai kopā ar citiem izstrādātājiem un ekspertiem izveidotu mērogojamus AI risinājumus, kuru pamatā ir reālas lietošanas gadījumi.

Reģistrēties tūlīt

Kopīgot, izmantojot