Evaluation Metrics
The evaluation metrics for models are generated using the test()
method of
nimbusml.Pipeline
.
The type of metrics to generate is inferred automatically by looking at the trainer type in the
pipeline. If a model has been loaded using the load_model()
method, then the evaltype must
be specified explicitly.
This corresponds to evaltype=’binary’.
AUC - see Receiver Operating Characteristic
Accuracy - see Precision and Recall
Positive Precision - see Precision and Recall
Positive Recall - see Precision and Recall
Negative Precision - see Precision and Recall
Negative Recall - see Precision and Recall
Log-loss - see Log Loss
Log-loss reduction - RIG(Y|X) * 100 = (H(Y) - H(Y|X)) / H(Y) * 100. Ranges from [-inf, 100], where 100 is perfect predictions and 0 indicates mean predictions.
Test-set Entropy - H(Y)
F1 Score - see Precision and Recall
AUPRC - see Area under Precision-Recall Curve
Note: Note about ROCThe computed AUC is defined as the probability that the score for a positive example is higher than the score for a negative one (see AucAggregator.cs in ML.NET). This expression is asymptotically equivalent to the area under the curve which is what scikit-learn computation. computes (see auc). That explains discrepencies on small test sets.
This corresponds to evaltype=’multiclass’.
Accuracy(micro-avg) - Every sample-class pair contribute equally to the accuracy metric.
Accuracy(macro-avg) - Every class contributes equally to the accuracy metric. Minority classes are given equal weight as the larger classes.
Log-loss - see Log Loss
Log-loss reduction - RIG(Y|X) * 100 = (H(Y) - H(Y|X)) / H(Y) * 100. Ranges from [-inf, 100], where 100 is perfect predictions and 0 indicates mean predictions.
(class N) - Accuracy of class N
This corresponds to evaltype=’regression’.
L1(avg) - E( | y - y’ | )
L2(avg) - E( ( y - y’ )^2 )
RMS(avg) - E( ( y - y’ )^2 )^0.5
Loss-fn(avg) - Expected value of loss function. If using square loss, is equal to L2(avg)
This corresponds to evaltype=’cluster’.
NMI - measure of the mutual dependence of the variables. See Normalized Variants. Range is in [0,1], where higher is better.
AvgMinScore - Mean distance of samples to centroids. Smaller is better.
This corresponds to evaltype=’ranking’.
NDCG@N - Normalized Discounted Cumulative Gain @ Top N positions. See Discounted Cumulative Gain
DCG@N - Discounted Cumulative Gain @ Top N positions. See Discounted Cumulative Gain
This corresponds to evaltype=’anomaly’.
AUC - see Receiver Operating Characteristic
DR @K FP - Detection rate at k false positives. When the test examples are sorted by the output of the anomaly detector in descending order, denote by K the index of the k’th example whose label is 0. Detection rate at k false positives is the detection rate at K.
DR @K FPR - Detection rate at fraction p false positives. When the test examples are sorted by the output of the anomaly detector in descending order, denote by K the index such that a fraction p of the label 0 examples are above K. Detection rate at fraction p false positives is the detection rate at K.
**DR @NumPos** - Detection rate at number of anomalies. Denote by D the number of label 1 examples in the test set. Detection rate at number of anomalies is equal to the detection rate at D.
NumAnomalies - Total number of anomalies detected.