How to create dependency within steps in Azure ML Pipelines without using Input/Output?

Question

How to create dependency within steps in Azure ML Pipelines without using Input/Output?

Subhodeep Chakraborty 0

Hello Azure Team,

In my company we are testing Azure ML and would like to migrate our current Airflow service, that orchestrates most of our daily-to-monthly scheduled DAGs, to Azure ML Pipelines (Python SDK v2). Most of the steps in each of our DAGs interact with an Azure SQL Server for data-load/save operations. When we are trying to author our jobs/pipeline in Azure ML SDK v2 I notice that there is no easy way to specify a simple order of execution among the jobs without explicitly specifying Input and Output, as it is obvious in our use case we do not have any of the accepted choices for the Output i.e. none of 'uri_folder', 'uri_file', 'mltable', 'mlflow_model' or 'custom_model'. So my question, is there a way to easily define it as we could in the Python SDK v1 using something like PipelineStep.run_after ?

Subhodeep Chakraborty 0 Reputation points

2024-04-09T07:49:49.35+00:00

Thanks for the quick response Just for adding it here I found this as a recommended answer by Microsoft Azure collective on StackOverflow - Azure Machine Learning SDK V1 migration to V2 Pipeline Steps. Personal opinion - the solution is too invasive and also brings in more requirements to implement test cases to test the home-grown graph builder class & functions.

I would try and work something out with the PythonScriptStep.
YutongTie-MSFT 53,971 Reputation points Moderator

2024-04-22T05:02:06.85+00:00

@Subhodeep Chakraborty Thanks for reaching out to us again and sharing your feedback, just for an update, I have shared this feedback to product team, I will update again here if there is any roadmap for adding this feature to V2.

Thanks for your feedback again.

Regards,

Yutong

1 answer

Your answer

Subhodeep Chakraborty 0 Reputation points

2024-04-09T07:49:49.35+00:00

Thanks for the quick response Just for adding it here I found this as a recommended answer by Microsoft Azure collective on StackOverflow - Azure Machine Learning SDK V1 migration to V2 Pipeline Steps. Personal opinion - the solution is too invasive and also brings in more requirements to implement test cases to test the home-grown graph builder class & functions.

I would try and work something out with the PythonScriptStep.
YutongTie-MSFT 53,971 Reputation points Moderator

2024-04-22T05:02:06.85+00:00

@Subhodeep Chakraborty Thanks for reaching out to us again and sharing your feedback, just for an update, I have shared this feedback to product team, I will update again here if there is any roadmap for adding this feature to V2.

Thanks for your feedback again.

Regards,

Yutong

Answer 1

@Subhodeep Chakraborty Thanks for reaching out to us, I am sorry. For v2 pipeline job, it does not support "run_after" feature yet, user may need to write some virtual inputs/outputs to decide the sequence.

The SDK v2 is designed around the concept of data dependencies and uses them to determine the order of execution.

However, there are workarounds that you can use to enforce the order of execution, please have a try.

Dummy Output: You can create a dummy output in one step and use it as an input to the next step. Even though this data won't be used, it will enforce the order of execution.
Use PythonScriptStep instead of PythonScriptStepV2: PythonScriptStep is still available and supports the run_after method. This allows you to specify the order of execution without any data dependencies.

I hope this helps. Just for an update, I have shared this feedback to product team, I will update again here if there is any roadmap for adding this feature to V2.

Thanks for sharing the feedback here again.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

How to create dependency within steps in Azure ML Pipelines without using Input/Output?

1 answer

Your answer