Costs of foreach activity with a lot of small copy activities

Niels Deckers 21 Reputation points
2022-05-09T18:05:54.613+00:00

Hi all,

I am currently working on a pipeline that gets data from a REST API.
First I get certains ID's with a LookUp and then I use those ID's in a foreach activity to do all the GET requests with a copy activity.
something like: https://api-base-url.nl/product/[ID]
For these simple calls that return small amounts of JSON data I get 228 activity runs and 3,76 DIU/hours.
This is pretty expensive when I think about how this runs in like 1 minute when I do it with Python locally.
I already downgraded the DIU to 2 in the settings of the copy activity, but I still feel like this operation is more expensive than it should be.
Does anybody have any tips on how to cut costs for this pipeline? thanks in advance!
200408-image.png

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,085 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,056 Reputation points
    2022-05-10T16:27:29.81+00:00

    Hello @Niels Deckers and welcome to Microsoft Q&A.

    While Data Factory may be good for moving large amounts of data, it is not necessarily the best when doing many many tiny calls, as you have found.

    Out of curiosity, why have you chosen to run this in Data Factory, when you had a Python script to do the same?

    I think we can make a hybrid solution, part Data Factory and part Python. The Custom Activity leverages Azure Batch compute. This can run Python among other things, and can be fed Datasets, or perhaps the output of the Lookup Activity. I'm not 100% certain, but I think this single activity would have less overhead than spinning up 228 separate activity runs.

    Does this appeal to you?

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful