Controlling the queuing of pipeline in Azure Data Factory

Question

Hello,

I was looking for solution for how to queue runs of a pipeline if there is already a pipeline that is running with the same set of parameters.

For example if I have generic pipeline that has one parameter I would only want concurrent runs if the parameter had different values between the runs.
If two requests were made to run the pipeline first with parameter "A" and then parameter "B" I would want these to be executed in parallel.
However if there were two request were made with the same parameter “A” I would want the second request to be queued by Data Factory until the first one finishes.

One option would be to have a pipeline for each parameter value . The concurrency could then be controlled at the pipeline level however this would lead to a large number of pipelines to create which we did not want to manage. Is there another alternative to control the queuing of pipelines?

Thanks.

Accepted Answer

Hello @rt and welcome to Microsoft Q&A. Thank you for you well-considered ask.

At this time, there is no feature to control pipeline concurrency with respect to parameter values.

Other than the option you outlined, I can think of one more possibility. A sort of home-brew custom scheduler. The details of this depend upon whether you are triggering pipelines manually, or via triggers.

If you are triggering manually, then you may want to build an application outside of Data Factory. This application would receive requests for pipeline runs, and query the Data Factory service to either get the status of existing runs, or start a new run. This could use REST API, Powershell, or other SDK/API/CLI. This has an advantage over the next option, it can provide feedback to the user.

While roundabout, it is possible to use the ADF Web activity to query the ADF service and get the list of current or past pipeline runs. Suppose we have a pipeline, which takes as input, the parameters to start another pipeline with. This pipeline would query the service and check for in-progress runs and compare parameters . Depending upon the result, it could trigger the desired pipeline and pass parameters, it could wait and try checking again later, or it could halt.
Your triggers would point to this "proxy" pipeline. This would necessitate having 1 "proxy" pipeline for every 1 "business" pipeline. I hope this increase would be smaller than the permutation of parameters.

Did I communicate effectively?

Answer

I'm having the same issue; I'm thinking that I could use the event grid as a trigger, and manage the duplication situation inside the grid,

Controlling the queuing of pipeline in Azure Data Factory

1 additional answer