Data Factory Rest API copy For each

Question

Data Factory Rest API copy For each

Maria Valek 80

Hi, I am building a pipeline where :

Web activity to get a list of group ids and display names (filtered) - this will be 50+
For each (From the output) I call a rest API in data factory copy activity for delta group '/groups/delta/?$filter= id eq '
save in blob storage - directory is defined as @concat(item().displayName,'/',formatDateTime(utcnow(),'yyyy/MM/dd/')) --(item from ForEach activity) File name is defined as @concat(item().displayName,'_',formatDateTime(utcnow(),'yyyyMMdd_HHmm_ssffffff'),'.json') Now, what keeps happening is that when running it in parallel, the output is all over the place. Some displayNames are duplicated in the individual blob folders, with just the timestamp different, some don't get copied at all (seemingly). I suspect that they are actually copied but saved with the wrong name. But I am confused about how this is possible, since I run for each activity. I have encountered same issue loading Microsoft Defender, with only 2 feeds, they either got both saved with the same name/different timestamp, r in a different folder, or contained the data for the other source.

User's image

Has anybody experienced something like this? I cannot figure out why it would do that. I am testing running it sequentially but that takes a very long time, even with just 51 API calls. Any advice would be appreciated, I cannot really any documentation about loading MDE or Graph data with Data factory API. Thank you, Maria

Maria Valek 80 Reputation points

2023-04-21T16:16:18.29+00:00

Just double checked and the files with the same name/different timestamp actually contain the same data. So some basically don't get copied.

Answer accepted by question author

0 additional answers

Your answer

Maria Valek 80 Reputation points

2023-04-21T16:16:18.29+00:00

Just double checked and the files with the same name/different timestamp actually contain the same data. So some basically don't get copied.

Answer 1

AnnuKumari-MSFT 34,566 Microsoft Employee Moderator

Hi Maria Valek , Welcome to Microsoft Q&A platform and thanks for posting your question here. It sounds like you are experiencing issues with the parallelism of your pipeline. It's possible that multiple instances of the copy activity are trying to write to the same blob storage location simultaneously, causing conflicts and incorrect naming or missing files.

Instead of timestamp , you can try distinguishing the sink file names with unique identifier values which can be generated with the help of guid() function.

User's image

Kindly try if this works fine for you. Else, try making the timestamp value more and more precise so that it doesn't generate same sink filename for two instances of iteration. Otherwise, running sequentially would be the only way out. Hope it helps. Kindly let us know how it goes. Thankyou

AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-04-28T05:11:20.27+00:00

Maria Valek ,

Just following up to see if the above answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'
Maria Valek 80 Reputation points

2023-04-28T08:18:52.2166667+00:00

Thank you. I think I may have used a variable in the directory path and the file name and considering they are 'global' to the pipeline not for each activity, that would be causing the issue.

Share via

Data Factory Rest API copy For each

0 additional answers

Your answer