ADF V2: Upsert using Copy Activity & Mapping DataFlow

Cloud_Knight 1 Reputation point
2022-02-11T09:04:24.073+00:00

I'm doing incremental data load in one of my project. For this, I will be performing upsert (insert & Update) logic from Azure SQL staging database to warehouse (Azure SQL) database.

Initially we thought if using Dataflow's (Source -> Alter Row transformation -> Sink ) to achieve this. However, I found that the upsert feature is also available in copy activity under the sink tab.

I have tested the upsert logic available in the copy activity and it is working fine. (Yet to test for larger datasets.) Based on the test results, I'm planning to do the incremental loads using copy activity upsert since this has following benefits over Dataflows.

  1. Less cost compared to Mapping Dataflows
  2. Processing time is less for smaller datasets (tested with 300k rows) as there is no need of spinning up the spark clusters which will reduce the process time from 3 to 5 minutes (time taken to create cluster).

I would like to understand, which one is the preferred way to do the upserts from the above options. And what is the difference doing upsert in copy activity and in Mapping Datafolows.

Any suggestions are greatly appreciated.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,061 Reputation points
    2022-02-14T22:47:34.04+00:00

    Hello and welcome to Microsoft Q&A @Cloud_Knight .

    You have the right of it so far. Mapping Data Flows allows for more transformations, transformations which are not possible with Copy Activity. For simple copying, Copy Activity is preferred.

    In both cases, you need to specify key column(s), without which the upsert would just be an insert. The Data Flow allows for more complex upsert criteria; you can have some complicated logic mark rows for upsert in the Alter Row transformation.

    0 comments No comments

  2. Cloud_Knight 1 Reputation point
    2022-02-14T22:56:25.147+00:00

    Thankyou @MartinJaffer-MSFT .

    However, I'm getting the some issue when I tried to upsert data for larger datasets (500K rows approx.) using copy activity.

    ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed. Please search error to get more details.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
    The statement has been terminated.,Source=.Net SqlClient Data Provider,SqlErrorNumber=-2,Class=11,ErrorCode=-2146232060,State=0,Errors=[{Class=11,Number=-2,State=0,Message=Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.,},{Class=0,Number=3621,State=0,Message=The statement has been terminated.,},],''Type=System.ComponentModel.Win32Exception,Message=The wait operation timed out,Source=,'

    I tried below options

    Unchecking the Use Tempdb option to "Select user DB schema" to dbo schema.

    Set "Write batch size" to 10000 rows

    Set "Write batch timeout" to 00:30:00 (30 mins)

    None of the options worked. :-(

    Also, I'm not able to add Dynamic Content for Key columns textbox. I tried this to pass keys dynamically from parameters unfortunately no parameters were showing up in the dynamic content for Key Columns

    Can you suggest???


  3. Cloud_Knight 1 Reputation point
    2022-03-01T09:48:36.997+00:00

    @MartinJaffer-MSFT Any update on this issue?