The evaluation metrics for models are generated using the test() method of
nimbusml.Pipeline.
The type of metrics to generate is inferred automatically by looking at the trainer type in the
pipeline. If a model has been loaded using the load_model() method, then the evaltype must
be specified explicitly.
Note: Note about ROCThe computed AUC is defined as the probability that the score for a positive example is higher than the score for a negative one (see AucAggregator.cs in ML.NET). This expression is asymptotically equivalent to the area under the curve which is what scikit-learn computation. computes (see auc). That explains discrepencies on small test sets.
Multiclass Classification Metrics
This corresponds to evaltype=’multiclass’.
Accuracy(micro-avg) - Every sample-class pair contribute equally to the accuracy metric.
Accuracy(macro-avg) - Every class contributes equally to the accuracy metric. Minority classes are
given equal weight as the larger classes.
DR @K FP - Detection rate at k false positives. When the test examples are sorted by the output
of the anomaly detector in descending order, denote by K the index of the k’th example whose label
is 0. Detection rate at k false positives is the detection rate at K.
DR @K FPR - Detection rate at fraction p false positives. When the test examples are sorted by
the output of the anomaly detector in descending order, denote by K the index such that a fraction
p of the label 0 examples are above K. Detection rate at fraction p false positives is the
detection rate at K.
**DR @NumPos** - Detection rate at number of anomalies. Denote by D the number of label 1 examples
in the test set. Detection rate at number of anomalies is equal to the detection rate at D.
NumAnomalies - Total number of anomalies detected.
How do we know if a model is good or bad at classifying our data? The way that computers assess model performance sometimes can be difficult for us to comprehend or can over-simplify how the model will behave in the real world. To build models that work in a satisfactory way, we need to find intuitive ways to assess them, and understand how these metrics can bias our view.
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.