Data Factory Rest API copy For each

Maria Valek 80 Reputation points
2023-04-21T15:52:15.9933333+00:00

Hi, I am building a pipeline where :

  1. Web activity to get a list of group ids and display names (filtered) - this will be 50+
  2. For each (From the output) I call a rest API in data factory copy activity for delta group '/groups/delta/?$filter= id eq '
  3. save in blob storage - directory is defined as @concat(item().displayName,'/',formatDateTime(utcnow(),'yyyy/MM/dd/')) --(item from ForEach activity) File name is defined as @concat(item().displayName,'_',formatDateTime(utcnow(),'yyyyMMdd_HHmm_ssffffff'),'.json') Now, what keeps happening is that when running it in parallel, the output is all over the place. Some displayNames are duplicated in the individual blob folders, with just the timestamp different, some don't get copied at all (seemingly). I suspect that they are actually copied but saved with the wrong name. But I am confused about how this is possible, since I run for each activity. I have encountered same issue loading Microsoft Defender, with only 2 feeds, they either got both saved with the same name/different timestamp, r in a different folder, or contained the data for the other source. User's image

User's image

User's image

Has anybody experienced something like this? I cannot figure out why it would do that. I am testing running it sequentially but that takes a very long time, even with just 51 API calls. Any advice would be appreciated, I cannot really any documentation about loading MDE or Graph data with Data factory API. Thank you, Maria

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
Microsoft Security | Microsoft Graph
{count} votes

Answer accepted by question author
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2023-04-25T16:36:28.9633333+00:00

    Hi Maria Valek , Welcome to Microsoft Q&A platform and thanks for posting your question here. It sounds like you are experiencing issues with the parallelism of your pipeline. It's possible that multiple instances of the copy activity are trying to write to the same blob storage location simultaneously, causing conflicts and incorrect naming or missing files.

    Instead of timestamp , you can try distinguishing the sink file names with unique identifier values which can be generated with the help of guid() function.

    User's image

    Kindly try if this works fine for you. Else, try making the timestamp value more and more precise so that it doesn't generate same sink filename for two instances of iteration. Otherwise, running sequentially would be the only way out. Hope it helps. Kindly let us know how it goes. Thankyou

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.