@Vinoth Kumar K Thanks for the question. Can you please share the sample notebook that you are trying.
Here is the notebook and doc that can help.
OutputFileDatasetConfig as Tutorial: ML pipelines for batch scoring - Azure Machine Learning | Microsoft Learn as a means to pass data between pipeline steps.
In multi step pipeline execution, how to maintain the data type of the columns when pass the dataset to next step
i am building a pipeline with multiple steps.
- Step 1 - Read the data from tabular dataset(with proper data types) , apply transformation and create an output dataset which will be passed as input to the step 2. However when i opened this dataset from the pipeline run log, the datatype all become string instead of maintaining the original data types of the input tabular data set
- Step 2 - use the output dataset of step 1 as input and apply some more transformations. However i have some logic based on data types which doesn't work because intermediate data set does not maintain the same data structure
is there anyway we can maintain the original data types/schema structure in the intermediate datasets?
Here is some snippets on my code :
feature_work = (
OutputFileDatasetConfig(
name="data_enhanced_add_global_variables",
destination=(def_blob_store, "data/processed/output/1"),
)
.read_delimited_files()
.as_upload(overwrite=True)
feature_engineering_step_1 = PythonScriptStep(name = "1_feature_engineering",
#source_directory = experiment_folder,
script_name = "1_feature_engineering.py",
arguments = ['--input-data', data_aggregate_DS.as_named_input('raw_data'),
'--prepped-data', feature_work],
#outputs=[prepped_data_folder],
outputs=[feature_work],
compute_target = compute_name,
runconfig = pipeline_run_config,
allow_reuse = True)
Step 2
feature_engineering_step_2 = PythonScriptStep(name = "2_feature_engineering",
#source_directory = experiment_folder,
script_name = "2_feature_engineering.py",
arguments = ['--input-data', feature_work.as_input(name='raw_data'),
'--prepped-data', feature_work1],
outputs=[feature_work1],
compute_target = compute_name,
runconfig = pipeline_run_config,
allow_reuse = True)
1 answer
Sort by: Most helpful
-
Ramr-msft 17,741 Reputation points
2022-02-28T12:43:31.087+00:00