Azure Data Factory - Retry for an insert Copy Activity (AzureSQL DB)

jldajo 26 Reputation points
2022-11-18T13:27:22.93+00:00

We’ve had twice intermittent issue of the copy activities running into

A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) And on the next run, the issue is not there anymore.

For SQL, say if 100k records get batched into 10k records, will we end up with duplicate records if something happens in the middle of the copy activity? I believe the copy activity is not treated as a single DB transaction.

For UPSERT (copy activities) in SQL, we do have retry enabled, as the key columns will ensure no duplicates will be created. We’re wondering if we can also enable Retry for INSERT (copy activities).

In our other projects, we do have retry enabled for the copy activities for those involving Files (since as per link, files will just be picked up on the one that failed).
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#resume-from-last-failed-run

Resume happens at file level. If copy activity fails when copying a file, in next run, this specific file will be re-copied.

Question is - will it be safe to enable RETRY for Copy Activites doing SQL Inserts (Azure SQL to another Azure SQL table)? Will it cause us to run into duplicate records when a transient error happens in the middle of the operation?

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
0 comments No comments
{count} votes

Accepted answer
  1. Pratik Somaiya 4,201 Reputation points
    2022-11-21T08:03:06.047+00:00

    Hello @jldajo

    There are two things that you can do to prevent duplicates

    1) Enable Fault Tolerance in COPY Activity: This will enable to skip the duplicated records, more details can be found here

    2) You can load the records into a Staging table and add a pre-copy script to truncate the staging table before every copy run

    Then, you can execute a stored procedure to avoid duplicates

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful