Hi @Saurabh Dhoble
Thank you for contacting Microsoft Q&A.
The data splitting for training and validation in Azure AutoML can occur before or after the featurization and imputation stages. If the split is done before cleaning and imputation, then the cleaning and imputation should happen for every split.
If the split is done after cleaning and imputation of the whole data, then the splitted data will already be in a cleaned and imputed state.
Below repro explains this, as the splitted data consists of 2 resultsets, 1 for training and other for scoring.
Also as a common practice in machine learning workflows, the splitting of the data into training and testing sets typically occurs during the initial setup of the experiment, before model training begins.
For more details please refer - Configure training, validation, cross-validation and test data in automated machine learning
Hope this helps.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.