How to eliminate few tables from the bulk load

Bigiron 110 Reputation points
2023-07-17T16:26:27.3666667+00:00

Hi All,

I have a scenario where both the databases are Azure SQL Databases, one is the PRODUCTION db and the other is used only for reporting. We have a pipeline that BULK loads all the tables from PROD--->REPORT. We have a total of 94 tables. My ask is , is there a way to exclude few tables from this bulk load process? Majority of these tables are TRUNCATE INSERT, few of these tables need is an UPSERT. Please advice.

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,150 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Vahid Ghafarpour 15,410 Reputation points
    2023-07-17T18:58:56.3133333+00:00

    For each table requiring UPSERT, create a subsequent activity (such as a Lookup or a Copy activity) to retrieve the data from the corresponding table in the reporting database and use a Join activity to join the source and target datasets based on the appropriate key column(s), and finally create a Mapping Data Flow activity or a custom activity (such as a Stored Procedure activity) to perform the UPSERT operation (insert/update) on the joined dataset.


  2. AnnuKumari-MSFT 29,601 Reputation points Microsoft Employee
    2023-07-18T14:23:30.82+00:00

    Hi Bigiron ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As I understand your query, you want to know how to cherry pick the tables and load the data of only specific tables from one DB to another using ADF pipeline. Please let me know if that is not the ask here.

    You can create a control table in your source SQL DB, and use that for looking up for Active tables

    In ADF, use lookup activity to select only those tablenames where isActive flag is 'Y' and use foreach activity to iterate through those tables and using copy activity, load the data from source to destination .

    You can also incorporate the logic to perform full load/Incremental load for each of the tables based on the size of the data for each table. For example, if a table has small set of data, you can keep it as full load table and perform truncate and reload everytime. However, if there is huge set of data, then it's better to implement incremental load for those tables and perform upsert for changed data.

    For reference, you can go through these videos in sequence, which walksthrough the same requirement in synapse analytics, however, it can be implemented in ADF too changing the sink to SQL DB instead of synapse.

    How to do full load from On Premise SQL Server till ADLS

    How to load latest and greatest data

    How to perform incremental load

    How to perform Upsert for Incremental records

    Additionally, you can explore CDC feature in ADF which is the easiest and quickest way to handle incremental load in SQL. For more details, kindly check the below resources:

    Change data capture in Azure Data Factory and Azure Synapse Analytics

    CDC(change data capture) for SQL Source in Mapping data flows in Azure Data Factory or Synapse

    CDC (change data capture) Resource in Azure Data Factory

    Hope it helps. Kindly accept the answer and take the survey. Thankyou