azure data factory pipelien in queued status

Philip O'Rourke 111 Reputation points
2022-04-18T09:37:32.647+00:00

Hi,

New to azure data factory.

Have created a dataflow which reads a compressed file .csv.gz on azure blob storage which loads to an azure sql database.

The dataflow is called from a pipleline and has a simple transformation it to convert 2 string columns to dates.

Works on small file couple hundred rows though even before starts can see piplline in queued status.

Not much cpu being used on database.

Why are pipelines going into queued status before executing when only 1 pipeline being ran?

How can we get them to start immediately?

Once they start they tend to run quickly.

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,149 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 32,011 Reputation points Microsoft Employee
    2022-04-19T09:53:37.213+00:00

    Hi @Philip O'Rourke ,

    Executing Data flow would take few minutes to spin up the cluster. The time till your pipeline is in queued state, it is actually enabling the data flow debug that is making the spark cluster ready to perform the transformations.

    Dataflow runs behind on spark clusters which are managed by ADF. Clusters are created on demand from scratch and will be destroyed after job is done. That's the reason that 4-5 minutes for acquiring compute. Once compute is acquired, the job runs and kill the cluster after the job run is completed.

    There is a workaround though that user can set TimeToLive in Azure IR, and this will keep cluster alive for next job (if the job falls in this time period). Like if you set the TTL for 10 minutes, it will wait if there is any other job for same IR arrives and continue the cycle. If no job arrives in 10 minutes it kills the cluster.

    From the ADF pipeline designer UI, go to Connections > Integration Runtimes > New. Select Azure IR and then open the Data Flow Run Time properties section. You will be able to see TimeToLive Option there which is The allowed idle time for the data flow compute. Specifies how long it stays alive after completion of a data flow run if there are no other active jobs.

    Please refer to the following blog posts for more information:

    1. https://techcommunity.microsoft.com/t5/azure-data-factory-blog/adf-adds-ttl-to-azure-ir-to-reduce-data-flow-activity-times/ba-p/878380
    2. https://social.msdn.microsoft.com/Forums/en-US/91d388d9-730f-4d53-93b2-2a8697513511/azure-dataflow-execution-behaviour?forum=AzureDataFactory

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful