In azure automl does the data splitting (train-test and validation) happens after featurization and imputation stage or a clean holdout is saved before applying imputaion logics?

Question

In azure automl does the data splitting (train-test and validation) happens after featurization and imputation stage or a clean holdout is saved before applying imputaion logics?

Saurabh Dhoble 20

trying to understand at what stage data splitting happens

dupammi 8,615 Reputation points Microsoft External Staff

2024-02-01T01:47:16.1866667+00:00

Hi @Saurabh Dhoble , Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful.
dupammi 8,615 Reputation points Microsoft External Staff

2024-02-02T01:24:28.13+00:00

Hi @Saurabh Dhoble , Following up to see if the below suggestion was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Accepted answer

0 additional answers

Your answer

dupammi 8,615 Reputation points Microsoft External Staff

2024-02-01T01:47:16.1866667+00:00

Hi @Saurabh Dhoble , Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful.
dupammi 8,615 Reputation points Microsoft External Staff

2024-02-02T01:24:28.13+00:00

Hi @Saurabh Dhoble , Following up to see if the below suggestion was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Answer 1

Hi @Saurabh Dhoble

Thank you for contacting Microsoft Q&A.

The data splitting for training and validation in Azure AutoML can occur before or after the featurization and imputation stages. If the split is done before cleaning and imputation, then the cleaning and imputation should happen for every split.

If the split is done after cleaning and imputation of the whole data, then the splitted data will already be in a cleaned and imputed state.

Below repro explains this, as the splitted data consists of 2 resultsets, 1 for training and other for scoring.

User's image

Also as a common practice in machine learning workflows, the splitting of the data into training and testing sets typically occurs during the initial setup of the experiment, before model training begins.

For more details please refer - Configure training, validation, cross-validation and test data in automated machine learning

Hope this helps.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Share via

In azure automl does the data splitting (train-test and validation) happens after featurization and imputation stage or a clean holdout is saved before applying imputaion logics?

0 additional answers

Your answer