@Puifai Santisakultarm welcome to the Microsoft Q&A community.
Currently Azure Machine Learning pipelines do not currently offer a native "retry on failure" setting for individual steps.
But you can try the following options below:
- Use Azure Logic Apps or Azure Functions
You can set up a Logic App or Function to monitor pipeline run statuses. If a run fails, it can automatically trigger a re-run of the pipeline via its endpoint. This approach is lightweight and is GUI-friendly:
Create a PipelineEndpoint for your ML pipeline.
Configure a Logic App to listen for failed runs.
On failure, the Logic App can call the endpoint to re-trigger the pipeline.
- Custom Retry Logic in Your Script
If the failure is intermittent (e.g., network hiccups), you can wrap the logic in your PythonScriptStep with retry logic using try/except
and a loop.
- Split the Pipeline into Smaller Pipelines
Break your pipeline into modular sub-pipelines. This way, if a step fails, you only need to re-run the failed sub-pipeline rather than the entire workflow. You can orchestrate these using Logic Apps or even a custom controller script.
- Track Step Outputs and Use
allow_reuse=True
If your steps produce outputs that are cached, Azure ML can skip re-running successful steps when you manually re-trigger the pipeline. This doesn’t retry automatically, but it minimizes redundant computation.
N/B: I have generated the above answer using co-pilot as an AI tool. Also I have validated and updated the AI output.
I hope these helps. Let me know if you have any further questions or need additional assistance.
Also if these answers your query, do click the "Upvote" and click "Accept the answer" of which might be beneficial to other community members reading this thread.