How to reuse existing synapse mapping dataflow spark cluster

Question

I am trying to execute synapse mapping dataflow inside for each loop. Everytime dataflow activity gets executed it is taking minimum 3 minutes to start the cluster. So If loop executes for 5 times it will take extra 12 to 15 minutes. How avoid this cluster startup time for each iteration.

Accepted Answer

Hi @Anonymous ,

Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

As per my understanding, you want to reduce the execution time of the pipeline by optimizing spark cluster spin up time. Please correct me if my understanding about your query is wrong.

You can use the Time to live feature available in ADF and Synapse

From the ADF pipeline designer UI, go to Connections > Integration Runtimes > New. Select Azure IR and then open the Data Flow Run Time properties section.

Specifying a time to live value keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL time, it will reuse the existing cluster and start up time will greatly reduced. After the second job completes, the cluster will again stay alive for the TTL time.
In the pipeline , dataflow activity settings tab, select the IR which you created with TTL.

For more information, kindly check the below resources: TTL to reduce Data Flow activity times

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Share via

How to reuse existing synapse mapping dataflow spark cluster

0 additional answers