Data Factory ForEach performance questions with REST API GET requests

Brian Behnke 106 Reputation points
2020-06-11T15:26:16.527+00:00

I have a pipeline that has a Lookup activity that is querying a single column of data from an Azure SQL table, fetching the first 1000 rows of data.
9901-api-pipeline.jpg

9788-sql-query.jpg

This input is then being used in a ForEach activity in order to use the values in a string concat for the REST API URL(Power BI API).

9794-foreach-settings.jpg

9795-rest-config.jpg

9796-dataset-properties.jpg

This process is working, just slower than I was expecting. The first 5 or so iterations through the ForEach activity are completing within 1-7 seconds, but after that each iteration is taking 30 seconds or more. It is completing the 1000 iterations in about 45-50 minutes. With having over 10,000 records I want to iterate over, this will take several hours to complete.

9857-copy-duration.jpg

Is my pipeline inefficiently built or is this an inherent issue with REST API iterations through a ForEach activity? The ForEach is using the default batch count(which I believe is 20, and that is what I am seeing being queued up in the pipeline output when running) but is it actually doing 20 API calls concurrently or is there some other limitation on this procedure? Is the bottleneck on the PowerBI API side of this?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,898 questions
{count} votes

Accepted answer
  1. Brian Behnke 106 Reputation points
    2020-06-25T20:15:25.637+00:00

    As it turns out, with a little more investigation and reading the documentation, I found this:

    10724-throttled.jpg

    This certainly explains the most, if not all, of the slowness of the API requests. Thanks for helping me try to solve this issue. I appreciate your time and attention to this issue.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,386 Reputation points Microsoft Employee
    2020-06-12T18:19:05.51+00:00

    Hello @BrianBehnke-8306 ,

    Thanks for the question . Since initially the performanace is good and then it went bad , there could be an sceanrio that the the calls in the remote servers are getting queued , but at this time its just an assumption .

    Can i request you to batch the calls ? I see that the batch count at this time is blank , can you please try with 10 and see how its goes and increment by a factor of 5 to find a sweet spot . Please do let me know how it goes .

    Thanks & stay safe

    Himanshu


  2. Bamak Mrbi 31 Reputation points
    2020-07-10T09:22:48.267+00:00

    Hello @BrianBehnke-1327,

    Thanks a lot for this post who calls me.
    Unfortunatly , I m facing same "problem" with a foreach activity on 1000 item and copy a CSV from Sales force CC REST API.

    Bad responce time after a moment

    Did your found please a solution , please ?
    Any advise ?

    I think about putting a WAIT every 20 requests in order to reduce the queue of requests
    (for each) Sequential property is false, but I don't think it has an impact

    I can't do anything on the SFCC side, it's a different team than BI

    Best regards

    Christophe


    "inputs": [
    {
    "referenceName": "HTTP_JSON",
    "type": "DatasetReference",
    "parameters": {
    "URLDynamique": "@concat('/s/XXXX/dw/shop/v19_8/orders/',pipeline().parameters.Ordernumber) "
    }
    }
    ],
    "outputs": [
    {
    "referenceName": "BLOB_FILE_CSV",
    "type": "DatasetReference",
    "parameters": {
    "filename": "Orderstaging.csv"
    }