ADF Copy Data: Timeout after about an hour copying from ADLS Gen2 to ADLS Gen2

Sebastien Lessard 6 Reputation points
2021-11-17T19:00:40.747+00:00

Hi,

We are setting up backups of some of our ADLS Gen2 storage accounts based on this blog article: https://cloudblogs.microsoft.com/industry-blog/en-gb/technetuk/2021/08/17/backup-your-data-lake-using-azure-data-factory-metadata-copy-activity/#:~:text=%20Backup%20your%20data%20lake%20using%20Azure%20Data,for%20the...%205%20Learn%20more.%20%20More%20

The pipeline created is using a Copy Data activity to copy the content from a source ADLS Gen 2 storage account to a destination ADLS Gen2 storage account.

We set things up and ran copy activities - and initially things were looking promising. Then when we attempted running the copy on our largest storage account - and the Copy Data activity always fails after more or less an hour with an error:

Failure happened on 'Sink' side. ErrorCode=AdlsGen2TimeoutError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Request to ADLS Gen2 account '<storage_account_name_redacted>' met timeout error. It is mostly caused by the poor network between the Self-hosted IR machine and the ADLS Gen2 account. Check the network to resolve such error. ,Source=Microsoft.DataTransfer.ClientLibrary,'

Ran it more than once and it always fails.

Here are some things I have tried:
-Configured "Retry" and "Retry Interval": Failed. From what I can see, it looks like the entire copy is starting over and failing again after an hour when I do this (but that is my interpretation, could be wrong).
-Set up a custom integration runtime, with higher core count (16), located in the same data center as the source storage account (Canada Central): Failed.
-Set up a custom integration runtime, with higher core count (16), located in the same data center as the destination storage account (East US): Failed.

Is there really a timeout value of an hour interrupting my transfer? If so, is it configurable? I looked but failed to find a timeout setting anywhere that looking like it (there is one on the Copy Data activity but it is still set to is default of 7 days).

Any other idea? My next attempt is going to try and break down the jobs into smaller segments like going down to a folder level on the storage account. But from a backup perspective, this is very risky as any new folder requires a modification to the backup job to include it.

Your help is appreciated.

Regards,

Sebastien

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,600 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,031 Reputation points
    2021-11-18T18:58:24.907+00:00

    Hello @Sebastien Lessard and welcome to Microsoft Q&A! This sounds frusterating.

    A note and a question.

    First, when you are doing Azure cloud -> Azure cloud copy like you are with ADLS gen2, a Self-Hosted Integration Runtime is unnecessary. Even more, by making the traffic leave Azure and then go back into Azure, you are increasing your costs, and most likely slowing down your Copy. It might also cause other issues.
    So when doing ADLS -> ADLS, use Azure Integration Runtime, not Self-Hosted.

    Onto the timeout issue. There is the 7 day default timeout you found, but I am not aware of any other limit in Copy Activity. I have seen copies that go for 7 hours before. So I will try to imagine other causes.

    How are you authenticating to the Storage account? SAS tokens can expire (not sure if it would looks like this).
    What are the parallelism settings? If the parallelism is set too high, it might be causing a traffic jam.
    Storage account does have throttling limits, but that is hard to hit.

    Do any of these seem plausible to you? It may be worth going to the storage account or Vnet and checking the logs.