Hi Jatinder Luthra ,
Thankyou for using Microsoft Q&A platform and thanks for posting your question here.
As I understand you question , it seems you are trying to merge set of few files into one file using copy activity in ADF pipeline. Please let me know if that's not the correct understanding of your query.
You can leverage wildcard file pattern as well as pipeline parameter to achieve this requirement. However, I would really like to stress on the point that the source file extension i.e. parquet_1, parquet_2 etc is not a valid file extension. System won't be able to render correctly as it's an unrecognized extension.
However, the following solution for merging the files should work for your scenario:
- Create a pipeline parameter say 'param'
- Drag a copy activity in your ADF pipeline
- In the source , select 'wildcard file path' in the filepath type and use this expression in the wildcard file path:
data_0_@{pipeline().parameters.param}_0.snappy.parquet_*
- Now in the sink dataset, create a parameter named 'outputfilename' and point the dataset to the output container and in filename use the created parameter by providing this expression :
@dataset().outputfilename
- Now , in sink settings, provide this expression in the parameter value:
data_0_@{pipeline().parameters.param}_0.snappy.parquet
- Change the copy behaviour to
Merge files
- Now, execute the pipeline by providing param value as 0,1,2,3 in each run.
Note: Kindly make sure that the schema of files that needs to be merged together is same . Columnnames, column order , number of columns should be same for the merge to happen.
For repro , I have taken csv file instead of parquet. Kindly check the below video for your reference:
Next run:
Hope it helps. Kindly accept the answer by clicking on Accept asnwer
button and take the survey to mark the answer as helpful. Thankyou