[ADF] Azure Data Factory Copy Activity number of files

Aminouvic 1 Reputation point
2020-08-05T15:10:16.443+00:00

We've a simple ADF pipeline that copies data from an on premise sql server database (throught one IR node) to ADLS Gen 2

We are trying to optimize job duration/performance by writing multiple files sink side but the parallel copy parameters / DIU seems to have no effect (the pipeline write one big file)

How can we configure the number of files written to ADLS in this case ?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,679 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-08-06T11:25:56.163+00:00

    Hi @Aminouvic ,

    Welcome to Microsoft Q&A Platform.

    Azure Data Factory has provided ability to run foreach activity in parallel by configuring below properties in it. If isSequential is set to false, the activity iterates in parallel with a maximum of 20 concurrent iterations. Documentation for the same can be found here.
    16086-batchcount-foreach.png

    Below are the sample runs I have done for couple of files with batch size of 1 and 2 and the execution run time variation depicts parallel loading of files.
    16135-batchsize1-noparallelism.png
    16162-batchsize2-paralleltesting.png
    Hope this helps! Please let us know for further queries and we will be glad to assist.


  2. HarithaMaddi-MSFT 10,136 Reputation points
    2020-09-16T08:58:10.18+00:00

    Hi @Aminouvic ,

    Want to share the new update from ADF team which we were discussing in August. Please check this link that gives more details on this property and kindly let us know if it helps in implementing your requirement.

    Thanks for your patience!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.