How to perform Upsert for Incremental records using Azure Data Factory

pankaj chaturvedi 86 Reputation points
2022-09-15T08:24:43.08+00:00

Hi @AnnuKumari-MSFT ,

How to perform upsert for incremental records from On-premise SQL database(Source) to Azure Sql DB(Target) in the automatic way as I would like to migrate multiple tables at once. I have gone through this video(https://www.youtube.com/watch?v=UGfJ3s9YGjs&ab_channel=AzureContent%3AAnnu) but i didn't get what are the 'Benchmarkfield' and 'BenchMarkvalue' and also if we are using [Keycolumns] then what is the use of these two columns ?. If you have been already performed this kind of scenario could you Please send screenshots if possible. Thanks.

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,547 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 32,736 Reputation points Microsoft Employee
    2022-09-16T09:31:18.867+00:00

    Hi @pankaj chaturvedi ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As I understand your ask, you are trying to understand what is the need of 'BenchmarkKey' and 'BenchmarkValue' in the incremental load process when we have the keycolumns for upsert. Please let me know if my understanding is incorrect.

    Benchmark/Watermark concept is for finding out the incremental records. A watermark is a column in the source table that has the last updated time stamp or an incrementing key.

    Suppose the last run of the ETL was a full load having 1GB of data. Now, the incremental records (eg. 20 records) needs to be either inserted or updated.

    The way to find out those 20 newly created or updated records is that we need to store the max(modifieddatetime) value from source for the last run and get all the modifieddatetime > the 'max(modifieddatetime) value from source for the last run'

    This value i.e 'max(modifieddatetime) value from source for the last run' needs to be stored somewhere for us to make the comparison . That's where benchmarkey and value comes into picture.

    So here Benchmarkkey would be modifieddatetime and the max(modifieddatetime) value would be stored as benchmarkvalue at the end of every transaction.

    If we don't make use of Benchmarkkey and Benchmarkvalue , and directly use Keycolumns to perform upsert , it would do the operation on whole dataset ie. 1GB+20 new records.

    To get more information, please watch:
    How to do full load from On Premise SQL Server till ADLS using Azure Synapse Pipelines
    How to perform incremental load from OnPremise SQL server to Dedicated Sql pool

    Additional resources: Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal
    Concept is same for On-Premise to Azure SQL sink as well.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.