Pipeline for few incremental tables & few flush and load tables.

RJ 326 Reputation points
2023-08-31T21:09:43.0366667+00:00

Hi there,

How to design ADF pipeline where few tables are incremental from source to target and few other tables are flush and load. I'm thinking I need to create separate pipelines 1 for incremental tables and other for flush and load.

Is there any other better way if its 100s of table.

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,646 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-09-06T06:49:09.9766667+00:00

    Hi RJ ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    I understand that you want to know how to process full load for few tables and incremental load for other tables in synapse or ADF. Please let me know if that is not the requirement.

    You need to create a control/lookup table that will store all the metadata regarding which table needs to be loaded as full load, and which one as incremental load . Use watermark/benchmark column to determine the incremental records . For inc records , You need to use upsert logic to update if the record is already present in sink and insert if it's not present based on the keycolumn value.

    You can use lookup activity to fetch the details from the control table and then use foreach activity to loop through all the records and inside foreach, use if block to check if it's inc load/full load . Inside both true / false conditions blocks, use copy activity , and in full load, use truncate table query in pre copy script and in inc load, get the benchmark value and load only those records which are greater than that benchmark value.

    For entire implementation , kindly check out these below videos:

    How to do full load from On Premise SQL Server till ADLS using Azure Synapse Pipelines

    How to load latest and greatest data from ADLS to Dedicated SQL Pool using Synapse Pipelines

    How to perform incremental load from OnPremise SQL server to Dedicated Sql pool

    How to perform Upsert for Incremental records using Azure Synapse Pipelines

    Hope it helps . Please accept the answer in case you find it helpful and take the survey as well. Thankyou

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.