Share via

How to give Source Directory on the step pipeline in Azure Machine Learning

Roopesh Bharatwaj K R 6 Reputation points
2022-12-05T17:50:34.633+00:00

Hi

I'm trying to Specify the Source Directory and tried several ways but i could not find the solution .

below is the example of my file: (where I'm trying to specify the source directory as CodeBase and file is Data.py
and my pipeline file as Datapipeline.py.

Folder Structure :

CodeBase
---Data.Py
---Data1.py
Pipeline
----Datapipeline.py

Code Example:
from azureml.pipeline.steps import PythonScriptStep

path= './CodeBase/'
dataprep_source_dir = path
entry_point = "Data.py"
data_prep_step = PythonScriptStep(name='Inference_Service',
script_name=entry_point,
source_directory=dataprep_source_dir,
inputs= [Top_150_Merchants.as_named_input('Top_150_Merchants'),
acquire.as_named_input('Weekly_volume_Acquire')],
outputs=[datafolder],
compute_target=compute_target,
runconfig=aml_run_config,
allow_reuse=True
)

ValueError: Step [Inference_Service]: script not found at: /mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-mural-> gpu/code/Users/Roopesh.Bharatwaj/Mural_Code/Pipeline/CodeBase/Data.py.

Make sure to specify an appropriate source_directory on the Step or default_source_directory on the Pipeline.

Kindly Let me know, if anyone can help me in this. Thank you !!

Azure Machine Learning
0 comments No comments

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,836 Reputation points
    2022-12-06T13:12:53.947+00:00

    @Roopesh Bharatwaj K R Thanks for the question. Here is the sample to specify the source directory. If you are still facing problem, please share the sample that you are using.

    https://learn.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py

    # step4 consumes the datasource (Datareference) in the previous step  
    # and produces processed_data1  
    trainStep = PythonScriptStep(  
        script_name="train.py",   
        arguments=["--input_data", blob_input_data, "--output_train", processed_data1],  
        inputs=[blob_input_data],  
        outputs=[processed_data1],  
        compute_target=aml_compute,   
        source_directory=source_directory,  
        runconfig=run_config  
    )  
    print("trainStep created")  
    

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.