Azure Data Factory Pipeline Queued status even when batch node is idle

Muhammad Hamza Shafiq 0 Reputation points
2023-04-26T15:57:32.34+00:00

Hi,
I have multiple triggers for my data factory jobs, which use integration runtime and run on a dedicated single node. Sometimes, even when the node is idle, the pipeline stays in queued stayed for an hour and then fails due to runtime error. The problem then solves itself and the pipeline starts working fine after an hour or two. I monitored my batch node, integration runtime and every thing works fine. It happened multiple times now Is there anything I am missing?

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
301 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,533 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 30,676 Reputation points Microsoft Employee
    2023-04-28T05:29:53.7933333+00:00

    Hi Muhammad Hamza Shafiq ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As I understand your question, your pipeline is getting stuck or in queued state for a long period of time and is failing due to integration runtime issue. Please let me know if that is not the case.

    This can happen for various reasons like hitting concurrency limits, service outages, network failures and so on.

    Resolution

    Concurrency Limit: If your pipeline has a concurrency policy, verify that there are no old pipeline runs in progress.

    Monitoring limits: Go to the ADF authoring canvas, select your pipeline, and determine if it has a concurrency property assigned to it. If it does, go to the Monitoring view, and make sure there's nothing in the past 45 days that's in progress. If there is something in progress, you can cancel it and the new pipeline run should start.

    • Transient Issues: It is possible that your run was impacted by a transient network issue, credential failures, services outages etc. If this happens, Azure Data Factory has an internal recovery process that monitors all the runs and starts them when it notices something went wrong. You can rerun pipelines and activities as described here. You can rerun activities if you had canceled activity or had a failure as per Rerun from activity failures. This process happens every one hour, so if your run is stuck for more than an hour, create a support case.

    For more details , kindly visit the troubleshooting guide here: Pipeline status is queued or stuck for a long time

    Additionally, you can use REST API to check IR node status before initiating the main pipeline . You can find more details regarding the implementation here: How to get status of all the nodes for Self hosted IR and loop until all nodes are active

    Hope it helps. Kindly accept the answer if it helped as accepted answer helps community as well.