How to create dependency within steps in Azure ML Pipelines without using Input/Output?

Subhodeep Chakraborty 0 Reputation points
2024-04-08T14:24:59.4566667+00:00

Hello Azure Team,

In my company we are testing Azure ML and would like to migrate our current Airflow service, that orchestrates most of our daily-to-monthly scheduled DAGs, to Azure ML Pipelines (Python SDK v2). Most of the steps in each of our DAGs interact with an Azure SQL Server for data-load/save operations. When we are trying to author our jobs/pipeline in Azure ML SDK v2 I notice that there is no easy way to specify a simple order of execution among the jobs without explicitly specifying Input and Output, as it is obvious in our use case we do not have any of the accepted choices for the Output i.e. none of 'uri_folder', 'uri_file', 'mltable', 'mlflow_model' or 'custom_model'. So my question, is there a way to easily define it as we could in the Python SDK v1 using something like PipelineStep.run_after ?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,917 questions
Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,039 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 51,756 Reputation points
    2024-04-08T22:48:50.99+00:00

    @Subhodeep Chakraborty Thanks for reaching out to us, I am sorry. For v2 pipeline job, it does not support "run_after" feature yet, user may need to write some virtual inputs/outputs to decide the sequence. 

    The SDK v2 is designed around the concept of data dependencies and uses them to determine the order of execution.

    However, there are workarounds that you can use to enforce the order of execution, please have a try.

    1. Dummy Output: You can create a dummy output in one step and use it as an input to the next step. Even though this data won't be used, it will enforce the order of execution.
    2. Use PythonScriptStep instead of PythonScriptStepV2: PythonScriptStep is still available and supports the run_after method. This allows you to specify the order of execution without any data dependencies.

    I hope this helps. Just for an update, I have shared this feedback to product team, I will update again here if there is any roadmap for adding this feature to V2.

    Thanks for sharing the feedback here again.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.