How do I force and Azure Machine Learning batch endpoint to rerun every time

Question

How do I force and Azure Machine Learning batch endpoint to rerun every time

Lee Harper 25

Within Azure ML, I have a machine learning pipeline running R code. I have successfully run this pipeline with the allow_reuse parameter being false, which means that the pipeline reruns every time it is invoked. This is the behavior that is required.

I want to deploy this pipeline behind a batch endpoint in Azure ML, since R is not supported with batch endpoints out of the box. I have used a PipelineComponentBatchDeployment to do this via the CLI, and it works - I am able to create a deployment with this pipeline behind the batch endpoint.

Unfortunately, I can only run the pipeline once, because the batch endpoint inputs do not change run to run (data is loaded via SQL query during the ML pipeline). It turns out that the job created when we invoke the batch endpoint defaults to azureml.enforceRerun = "False", rather than inheriting the azureml.enforceRerun = "True" from the parent pipeline job.

I have already tried adding a force_rerun: True to the deployment.yml file per the documentation to try to inject this into the invoked job, but this isn't doing anything. We want to use the CLI for this and not the python SDK for devops reasons.

In this situation, is there a way to be able to ensure that the job created by the invoked batch-endpoint will rerun the complete pipeline every time it is executed?

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-03-19T08:25:53.8266667+00:00

@Lee Harper Could you check if any of these debug steps help to determine why the component or job is not re-run?

Also for batch endpoints you can specify inputs from different sources for each run as documented here. Does this help if you pass the inputs with every run?
Lee Harper 25 Reputation points

2024-03-19T20:47:31.7433333+00:00

Hi Ram,

Those debug steps are for the underlying pipeline, which when executed in isolation does run correctly with force rerunning. Pipeline jobs that execute from the pipeline directly do work correctly, but those don't have a persistent REST API which can be called from other applications.

Is there a mechanism by which I can make the pipeline job that the batch endpoint executes inherit those parameters from the pipeline that was deployed behind the endpoint? Currently it is not inheriting those parameters. So for example, even though the parent pipeline correctly has force rerun = True, the batch endpoint triggers a job with force rerun = False.

In terms of the inputs, we would rather not pass junk inputs into the REST API in a production scenario. The only two inputs that we currently have are two string literals: the input Synapse schema/table to read from, and the output Synapse schame/table to write to.
Lee Harper 25 Reputation points

2024-03-19T20:50:45.33+00:00

And per the debugging instructions, the batch endpoint pipeline job has forceRerun = False and isDeterministic = True, but since I'm not defining the pipeline job (I only define the deployment based on an executed pipeline) I don't have any easy way to update this.
Anonymous

2024-04-09T15:48:03.8666667+00:00

I have the same requirements and the same Problem as @Lee Harper .

Are there any news on that issue?
Arthur Francisco Araujo Fernandes 20 Reputation points

2024-06-19T14:02:53.37+00:00

Is this still an issue!? I'm having the same problem! Any updates?

1 answer

Your answer

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-03-19T08:25:53.8266667+00:00

@Lee Harper Could you check if any of these debug steps help to determine why the component or job is not re-run?

Also for batch endpoints you can specify inputs from different sources for each run as documented here. Does this help if you pass the inputs with every run?
Lee Harper 25 Reputation points

2024-03-19T20:47:31.7433333+00:00

Hi Ram,

Those debug steps are for the underlying pipeline, which when executed in isolation does run correctly with force rerunning. Pipeline jobs that execute from the pipeline directly do work correctly, but those don't have a persistent REST API which can be called from other applications.

Is there a mechanism by which I can make the pipeline job that the batch endpoint executes inherit those parameters from the pipeline that was deployed behind the endpoint? Currently it is not inheriting those parameters. So for example, even though the parent pipeline correctly has force rerun = True, the batch endpoint triggers a job with force rerun = False.

In terms of the inputs, we would rather not pass junk inputs into the REST API in a production scenario. The only two inputs that we currently have are two string literals: the input Synapse schema/table to read from, and the output Synapse schame/table to write to.
Lee Harper 25 Reputation points

2024-03-19T20:50:45.33+00:00

And per the debugging instructions, the batch endpoint pipeline job has forceRerun = False and isDeterministic = True, but since I'm not defining the pipeline job (I only define the deployment based on an executed pipeline) I don't have any easy way to update this.
Anonymous

2024-04-09T15:48:03.8666667+00:00

I have the same requirements and the same Problem as @Lee Harper .

Are there any news on that issue?
Arthur Francisco Araujo Fernandes 20 Reputation points

2024-06-19T14:02:53.37+00:00

Is this still an issue!? I'm having the same problem! Any updates?

Answer 1

Arthur Francisco Araujo Fernandes 20

Since there is still no solution from MS. Here is a BAD workaround solution to rerun EVERYTIME. Add a dummy parameter to your batch work as a UID or a timestamp.

Share via

How do I force and Azure Machine Learning batch endpoint to rerun every time

1 answer

Your answer