In azure automl does the data splitting (train-test and validation) happens after featurization and imputation stage or a clean holdout is saved before applying imputaion logics?

Saurabh Dhoble 20 Reputation points
2024-01-30T17:10:46.3766667+00:00

trying to understand at what stage data splitting happens

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,335 questions
{count} votes

Accepted answer
  1. dupammi 8,615 Reputation points Microsoft External Staff
    2024-01-31T07:04:06.3033333+00:00

    Hi @Saurabh Dhoble

    Thank you for contacting Microsoft Q&A.

    The data splitting for training and validation in Azure AutoML can occur before or after the featurization and imputation stages. If the split is done before cleaning and imputation, then the cleaning and imputation should happen for every split.

    If the split is done after cleaning and imputation of the whole data, then the splitted data will already be in a cleaned and imputed state.

    Below repro explains this, as the splitted data consists of 2 resultsets, 1 for training and other for scoring.

    User's image

    Also as a common practice in machine learning workflows, the splitting of the data into training and testing sets typically occurs during the initial setup of the experiment, before model training begins.

    For more details please refer - Configure training, validation, cross-validation and test data in automated machine learning

    Hope this helps.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.