Thanks for using Microsoft Q&A.
To Iterate and merge the batch of 1000 files you need to first group the file names/ path in a different file that need to be processed in a batch. the using list of files option in ADF copy activity merge the files.
Follow the below process (I tried with total 5 files by grouping 2 files):
1.Take a dataflow activity add new dataflow.
- In source add the dataset where all 10000 files are located and in source options set Wildcard paths as
*and with Column to store file name add file name column in data set. - Take an aggregate transformation and in group by select filename column and in aggregates perform any aggregate task on any column.
- Then take a derived column activity to remove the first backslash from file name/path with expression
dropLeft(filename,1)
- In sink add dataset where you need to store these files with unchecking the First row as header
- Set Skip line count to 1
- In setting set the File name option as
Patternand Pattern asfile[1]000.csv - In mapping select only filename column.
- In Optimize set the partitioning as dynamic range based on filename column and set the no of partition in your case 10
- Output of the is as below:
- Then use get metadata activity to get the list of files where each file contains list of files that need to be merged.
3.Then pass these files to foreach activity to loop over @activity('Get Metadata1').output.childItems.
- In for each activity take copy activity
- Set File path type is List of files in Path to files list set the dynamic way to get the files.
- In copy activity dataset add the data set that can tail files need to be merged.
- Add sink and set copy behaviour as merge files.
- Output: It will merge all the list of files which we got from dataflow
I hope this helps!
If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.