Hi ,
Thanks for reaching out to Microsoft Q&A.
There are some factors and techniques that can help you increase the likelihood that newly arrived csv files will reuse an already active Spark cluster in Synapse Data Flows. However, be aware that there is no absolute guarantee given synapse’s orchestration logic that the same cluster will always be reused.
- Batch or sequentially handle new CSV files rather than firing up completely separate pipelines per file drop.
- Restrict concurrency (trigger and pipeline) so that you do not spin up multiple Data Flows in parallel.
- Point all Data Flow executions to the same Azure IR with an appropriate TTL.
- Reduce parallel runs so that the same cluster can be reused.
With these steps, you will significantly increase the chance that new csv files are processed by the already active spark cluster, taking advantage of the 1-hour TTL whenever possible.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.