Hi,
I have a question regarding the copy behavior option merge when it comes to use data lake as a sink in a copy activity.
Let's assume we have the following 2 csv files
File 1:
1, John
2, Sarah
File 2:
3, Bob
4, Janet
If I pick these 2 files as my source and try to merge them into a file named FinalFile.csv, I understand that the rows may not be in a sequential order. For example, having something like:
1, John
3, Bob
2, Sarah
4, Janet
My understanding is that the merge behavior doesn't really guarantee that files are merged in a sequential manner and records may be mixed.
Let's now say that I'm picking File2 as my source and File1 as my sink. I'd kind of expect that the final output would be:
1, John
2, Sarah
3, Bob
4, Janet
since we're using a file with content already in it and just merging one source file. However, it does seem that only rows from File2 will be there and rows that were on the file are gone. Basically, it seems that it does an overwrite and not an append.
Question: based on what I mentioned above, is there any way to guarantee the order of the records using the copy activity? I'm aware that I could potentially use a data flow or an Azure Function to achieve this but don't really want to go down that route unless there's no other option. I guess one potential option (which I haven't tried yet) is to provide a file with a list of file to the source of the copy activity and set the degree of parallelism to 1, hoping it would merge files by the order they're in the file with the filenames. However, not really keen on this option.
Thanks
Pedro