Evaluation Metrics

Metrics from Pipeline.test()

The evaluation metrics for models are generated using the test() method of nimbusml.Pipeline.

The type of metrics to generate is inferred automatically by looking at the trainer type in the pipeline. If a model has been loaded using the load_model() method, then the evaltype must be specified explicitly.

Binary Classification Metrics

This corresponds to evaltype=’binary’.

AUC - see Receiver Operating Characteristic

Accuracy - see Precision and Recall

Positive Precision - see Precision and Recall

Positive Recall - see Precision and Recall

Negative Precision - see Precision and Recall

Negative Recall - see Precision and Recall

Log-loss - see Log Loss

Log-loss reduction - RIG(Y|X) * 100 = (H(Y) - H(Y|X)) / H(Y) * 100. Ranges from [-inf, 100], where 100 is perfect predictions and 0 indicates mean predictions.

Test-set Entropy - H(Y)

F1 Score - see Precision and Recall

AUPRC - see Area under Precision-Recall Curve

Note: Note about ROCThe computed AUC is defined as the probability that the score for a positive example is higher than the score for a negative one (see AucAggregator.cs in ML.NET). This expression is asymptotically equivalent to the area under the curve which is what scikit-learn computation. computes (see auc). That explains discrepencies on small test sets.

Multiclass Classification Metrics

This corresponds to evaltype=’multiclass’.

Accuracy(micro-avg) - Every sample-class pair contribute equally to the accuracy metric.

Accuracy(macro-avg) - Every class contributes equally to the accuracy metric. Minority classes are given equal weight as the larger classes.

Log-loss - see Log Loss

Log-loss reduction - RIG(Y|X) * 100 = (H(Y) - H(Y|X)) / H(Y) * 100. Ranges from [-inf, 100], where 100 is perfect predictions and 0 indicates mean predictions.

(class N) - Accuracy of class N

Regression Metrics

This corresponds to evaltype=’regression’.

L1(avg) - E( | y - y’ | )

L2(avg) - E( ( y - y’ )^2 )

RMS(avg) - E( ( y - y’ )^2 )^0.5

Loss-fn(avg) - Expected value of loss function. If using square loss, is equal to L2(avg)

Clustering Metrics

This corresponds to evaltype=’cluster’.

NMI - measure of the mutual dependence of the variables. See Normalized Variants. Range is in [0,1], where higher is better.

AvgMinScore - Mean distance of samples to centroids. Smaller is better.

Ranking Metrics

This corresponds to evaltype=’ranking’.

NDCG@N - Normalized Discounted Cumulative Gain @ Top N positions. See Discounted Cumulative Gain

DCG@N - Discounted Cumulative Gain @ Top N positions. See Discounted Cumulative Gain

Anomaly Detection Metrics

This corresponds to evaltype=’anomaly’.

AUC - see Receiver Operating Characteristic

DR @K FP - Detection rate at k false positives. When the test examples are sorted by the output of the anomaly detector in descending order, denote by K the index of the k’th example whose label is 0. Detection rate at k false positives is the detection rate at K.

DR @K FPR - Detection rate at fraction p false positives. When the test examples are sorted by the output of the anomaly detector in descending order, denote by K the index such that a fraction p of the label 0 examples are above K. Detection rate at fraction p false positives is the detection rate at K.

**DR @NumPos** - Detection rate at number of anomalies. Denote by D the number of label 1 examples in the test set. Detection rate at number of anomalies is equal to the detection rate at D.

NumAnomalies - Total number of anomalies detected.