Thanks for using Microsoft Q&A
- Create a new Data Flow
You are going to create a very simple Data Flow just to leverage file partitioning. There will not be any column or row transformations. Just a Source and a Sink that will take a large file and produce smaller part files.
- Add a Source file
- Add a Sink folder
For the Sink dataset, choose the type of output files you would like to produce.
- In the Optimize tab of the Sink transformation, select the "Set Partitioning" radio button. You will be presented with a series of options for partitioning define the partitioning.
- This is where you will define how you would like the partitioned files to be generated. for equal distribution use round robin method
- set the output file names using the “pattern” option like"part1.csv", "part2.csv", etc.
- Notice I’ve also set “Clear the folder”. This will ask ADF to wipe the contents of the destination folder clean before loading new part files.
- Save your data flow and create a new pipeline.
- Add an Execute Data Flow activity and select your new file split data flow.
- Execute the pipeline using the pipeline debug button.
- You must execute data flows from a pipeline in order to generate file output. Debugging from Data Flow does not write any data.
- After execution, you should now see files that resulted from round robin partitioning of your large source file. You’re done:
In the output of your pipeline debug run, you’ll see the execution results of the data flow activity. Click on eyeglasses icon to show the details of your data flow execution. You’ll see the statistics of the distribution of records in your partitioned files:
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.