Timeout error when getting the status of a pipeline

Kothai Ramanathan 941 Reputation points Microsoft Employee
2021-01-14T08:23:53.58+00:00

I have one pipeline calling the other. Post that, the parent pipeline waits for the completion of the pipeline. Here it calls the REST API providing the run id of the child pipeline. At times, I see that this call is timing out, with the following error message :

Error calling the endpoint 'https://management.azure.com/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.DataFactory/factories/zzz/pipelineruns/c9c51bd0-5c61-4a62-b220-df253faa829f?api-version=2018-06-01'. Response status code: 'RequestTimeout'. More details:Exception message: 'A task was canceled.'. Url endpoint request timed out. Please make sure the endpoint response is within 1 minute and retry.

Can you please let me know why this is happening and how can it be avoided.

Couple of points to note :

  1. 99% of the times, it is successful. But rarely such a scenario happens.
  2. The child pipeline is called through the REST API (createRun) as the name of the child pipeline is dynamic. Hence ExecutePipeline is not used.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
0 comments No comments
{count} votes

Accepted answer
  1. Kothai Ramanathan 941 Reputation points Microsoft Employee
    2021-01-20T04:12:18.637+00:00

    Thanks @MartinJaffer-MSFT .... I introduced an ARM template parameter and used that to set the retry count for all the applicable activities, thanks.

    2 people found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. Kothai Ramanathan 941 Reputation points Microsoft Employee
    2021-01-15T07:06:39.377+00:00

    Thank you @MartinJaffer-MSFT for the details. Would increasing the retry count solve this problem ?

    1 person found this answer helpful.

  2. MartinJaffer-MSFT 26,061 Reputation points
    2021-01-14T19:19:52.547+00:00

    Hello @Kothai Ramanathan and welcome back to Microsoft Q&A.

    Given that the failure rate is 1%, the simplest solution is to add in a redundancy like shown below. This would reduce failure rate to 1% of 1% (0.01%).
    56773-image.png

    In this diagram there are two web activities and a wait activity. The web activities are two attempts on the call which times out. The wait activity is a placeholder for whatever comes afterwards.
    The first web activity (attempt1) is connected to the second web activity (attempt2) by a red on-failure dependency.
    The first web activity (attempt1) is connected to the wait activity by a blue on-completion dependency.
    The second web activity (attempt2) is connected to the wait activity by a green on-success dependency and a grey skipped dependency.

    If the first web activity fails or times out, then the second web activity runs.
    If the second web activity fails or times out, then the wait activity does not run, and the pipeline fails.
    If the second web activity succeeds, then the wait activity runs and the pipeline succeeds.

    An investigation into why the call times out requires a deeper look than I can provide here. If you still want a root cause analysis, let me know and I can give you a 1-time free support ticket.

    0 comments No comments