Thanks @MartinJaffer-MSFT .... I introduced an ARM template parameter and used that to set the retry count for all the applicable activities, thanks.
Timeout error when getting the status of a pipeline
I have one pipeline calling the other. Post that, the parent pipeline waits for the completion of the pipeline. Here it calls the REST API providing the run id of the child pipeline. At times, I see that this call is timing out, with the following error message :
Error calling the endpoint 'https://management.azure.com/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.DataFactory/factories/zzz/pipelineruns/c9c51bd0-5c61-4a62-b220-df253faa829f?api-version=2018-06-01'. Response status code: 'RequestTimeout'. More details:Exception message: 'A task was canceled.'. Url endpoint request timed out. Please make sure the endpoint response is within 1 minute and retry.
Can you please let me know why this is happening and how can it be avoided.
Couple of points to note :
- 99% of the times, it is successful. But rarely such a scenario happens.
- The child pipeline is called through the REST API (createRun) as the name of the child pipeline is dynamic. Hence ExecutePipeline is not used.
-
Kothai Ramanathan 941 Reputation points Microsoft Employee
2021-01-20T04:12:18.637+00:00
2 additional answers
Sort by: Most helpful
-
Kothai Ramanathan 941 Reputation points Microsoft Employee
2021-01-15T07:06:39.377+00:00 Thank you @MartinJaffer-MSFT for the details. Would increasing the retry count solve this problem ?
-
MartinJaffer-MSFT 26,086 Reputation points
2021-01-14T19:19:52.547+00:00 Hello @Kothai Ramanathan and welcome back to Microsoft Q&A.
Given that the failure rate is 1%, the simplest solution is to add in a redundancy like shown below. This would reduce failure rate to 1% of 1% (0.01%).
In this diagram there are two web activities and a wait activity. The web activities are two attempts on the call which times out. The wait activity is a placeholder for whatever comes afterwards.
The first web activity (attempt1) is connected to the second web activity (attempt2) by a red on-failure dependency.
The first web activity (attempt1) is connected to the wait activity by a blue on-completion dependency.
The second web activity (attempt2) is connected to the wait activity by a green on-success dependency and a grey skipped dependency.If the first web activity fails or times out, then the second web activity runs.
If the second web activity fails or times out, then the wait activity does not run, and the pipeline fails.
If the second web activity succeeds, then the wait activity runs and the pipeline succeeds.An investigation into why the call times out requires a deeper look than I can provide here. If you still want a root cause analysis, let me know and I can give you a 1-time free support ticket.