I am thinking on below lines at the moment.
- Use "Get Metadata" activity with "ChildItems" to list all files for copy.
- Write this list of files to a Azure SQL table.
- Use a "Lookup" with "query" to get the list of distinct filenames without the timestamp.
So if you refer to screenshot of files, Lookup activity should get me following -
0CRM_SALES_ACT_1.json
CRM_0BP_DEF_ADDRESS_ATTR.json
ZDSOUTPAYMETH.json - Now I will iterate over this list one at a time and use a "Copy" activity with SOURCE File Path type selected as "Wildcard File Path" and pass <filename>_20*.json which will pick all copies of the <file>.
At the Sink tab of above "Copy" , I will choose "Merge files" for Copy behaviour and also pass the merged filename as <file>_<latest_timestamp>.json and write to another container of the storage.
- I will then use dataflow to remove duplicate rows and write to another container. From there I can do usual processing I have been doing .
Please do advise how this sounds, It alright then I need to find how to do all of the steps in ADF code.
Thanks
Vineet