Azure Automated ML(interface) Does k-fold cross validation in autoML use just random sampling?

J. Jeong 61 Reputation points
2020-07-03T06:52:46.663+00:00

Is k-fold cross validation in automated ML(interface) stratified sampling or random sampling by default?
I have ran several automated ML experiments using a training set with five data points for the least common class(say class A), and started to wonder if each CV set is guaranteed to have at least one element from the class A when I set k as 4 or 5.
I read the 'Train and validation data' part in the link below
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train
and want to make sure if it's okay to use 4-fold cv or 5-fold cv.
Thanks.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,346 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,826 Reputation points
    2020-07-06T10:45:14.49+00:00

    Thanks for the question. We don't expose the sampling methods.If possible can we discuss offline on this, please send an email to AzCommunity@microsoft.com to discuss further on this. Creating an AutoML tool is always a balance of automating as much as possible for the user while allowing advanced users to have deeper control of the process. You can consider some pre-processing on the data before handing off to AutoML. A few techniques that worked with that you may want to consider or combine with SMOTE:
    • Downsample the majority class
    • Stratified sampling
    • ADASYN for creating synthetic observations
    In fact, the SMOTE paper actually references a few cases where when SMOTE is combined with downsampling, it outperforms SMOTE on its own.
    Here is the helpful link for cross-validation folds.
    https://github.com/Azure/MachineLearningNotebooks/issues/596


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.