Azure Automated ML(interface) choosing primary metrics to handle imbalanced data

Question

Azure Automated ML(interface) choosing primary metrics to handle imbalanced data

J. Jeong 61

I figured out that there are some primary metrics I can choose when I run an automated ML experiment. Yet the number of primary metrics is fewer than the run metrics in the result page. I want to deal with imbalanced data(10:1 or 20:1) and

looked up the links below:
https://learn.microsoft.com/en-us/azure/machine-learning/concept-manage-ml-pitfalls#identify-models-with-imbalanced-data
and
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train

It seems F1 score is recommended to evaluate each model with imbalanced data.

Here are my questions:

Is there any way to set F1 score or multiple measures as a primary metric?
If there is no such way, should I do it manually?
Of all the given primary metrics, which primary metric is the most appropriate(to build a Classification model with imbalanced data)?

Thanks.

Ramr-msft 17,826 Reputation points

2020-06-30T08:03:24.53+00:00

@JiinJeong-9636 Here is the useful link that discussed about the automl settings(primary metric, number of cross-validation folds etc.).If you select another model from a sweep, choosing based on a different metric, that model is likely reasonable, though less optimized than it could be. Many metric are well correlated so choosing one tends to co-optimize the other (but to a lesser extent); some other metric pairs can be at odds. For multiple measures as a primary metric forwarded to the product team to check.

Accepted answer

0 additional answers

Your answer

Ramr-msft 17,826 Reputation points

2020-06-30T08:03:24.53+00:00

@JiinJeong-9636 Here is the useful link that discussed about the automl settings(primary metric, number of cross-validation folds etc.).If you select another model from a sweep, choosing based on a different metric, that model is likely reasonable, though less optimized than it could be. Many metric are well correlated so choosing one tends to co-optimize the other (but to a lesser extent); some other metric pairs can be at odds. For multiple measures as a primary metric forwarded to the product team to check.

Answer 1

Ramr-msft 17,826

For imbalanced data, it is preferred to choose AUC Weighted. Also user should then choose a metric that is appropriate to work well for imbalance. E.g. F1, micro averaged AUC, balanced accuracy for model evaluation. For primary metric (metric used for model optimization) the user should preferably choose AUC Weighted instead of accuracy.
Currently from the ml.azure.com the following metrics are supported. To add F1 score metric forwarded to product team to check on this.
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train#primary-metric

Ramr-msft 17,826 Reputation points

2020-07-01T08:19:59.29+00:00

@JiinJeong-9636 We have plan to support optimizing on metrics like F1 in near future.You can try AUC_weighted or norm_macro_recall depending on the use case, but for simplicity AUC_weighted should be good.
J. Jeong 61 Reputation points

2020-07-03T05:18:06.18+00:00

Thank you.
Thabo Mofokeng 21 Reputation points

2021-04-30T16:15:27.06+00:00

Hi all,

I agree. I am presently working on a data set that is split 60/40 between the two binary outcomes 0 and 1. I find that optimizing for AUC returns better performance results compared to other metrics.
CB 0 Reputation points

2023-02-10T11:46:16.8966667+00:00

Is there any news on setting F1 as the primary metric?

Share via

Azure Automated ML(interface) choosing primary metrics to handle imbalanced data

0 additional answers

Your answer