Azure Automated ML(interface) Does k-fold cross validation in autoML use just random sampling?

Question

Azure Automated ML(interface) Does k-fold cross validation in autoML use just random sampling?

J. Jeong 61

Is k-fold cross validation in automated ML(interface) stratified sampling or random sampling by default?
I have ran several automated ML experiments using a training set with five data points for the least common class(say class A), and started to wonder if each CV set is guaranteed to have at least one element from the class A when I set k as 4 or 5.
I read the 'Train and validation data' part in the link below
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train
and want to make sure if it's okay to use 4-fold cv or 5-fold cv.
Thanks.

1 answer

Your answer

Answer 1

Ramr-msft 17,826

Thanks for the question. We don't expose the sampling methods.If possible can we discuss offline on this, please send an email to AzCommunity@microsoft.com to discuss further on this. Creating an AutoML tool is always a balance of automating as much as possible for the user while allowing advanced users to have deeper control of the process. You can consider some pre-processing on the data before handing off to AutoML. A few techniques that worked with that you may want to consider or combine with SMOTE:
• Downsample the majority class
• Stratified sampling
• ADASYN for creating synthetic observations
In fact, the SMOTE paper actually references a few cases where when SMOTE is combined with downsampling, it outperforms SMOTE on its own.
Here is the helpful link for cross-validation folds.
https://github.com/Azure/MachineLearningNotebooks/issues/596

Ramr-msft 17,826 Reputation points

2020-07-13T11:48:00.36+00:00

@JiinJeong-9636 Can you please send an email to discuss offline for default sampling that used in AutoML.
J. Jeong 61 Reputation points

2020-07-14T07:31:10.607+00:00

Oh, thanks
what's your contact email address?
Ramr-msft 17,826 Reputation points

2020-07-15T03:38:54.797+00:00

@J. Jeong Thanks, You can send an email to AzCommunity@microsoft.com to discuss on this.

Share via

Azure Automated ML(interface) Does k-fold cross validation in autoML use just random sampling?

1 answer

Your answer