For each table requiring UPSERT, create a subsequent activity (such as a Lookup or a Copy activity) to retrieve the data from the corresponding table in the reporting database and use a Join activity to join the source and target datasets based on the appropriate key column(s), and finally create a Mapping Data Flow activity or a custom activity (such as a Stored Procedure activity) to perform the UPSERT operation (insert/update) on the joined dataset.
How to eliminate few tables from the bulk load
Hi All,
I have a scenario where both the databases are Azure SQL Databases, one is the PRODUCTION db and the other is used only for reporting. We have a pipeline that BULK loads all the tables from PROD--->REPORT. We have a total of 94 tables. My ask is , is there a way to exclude few tables from this bulk load process? Majority of these tables are TRUNCATE INSERT, few of these tables need is an UPSERT. Please advice.
Thanks
2 answers
Sort by: Most helpful
-
-
AnnuKumari-MSFT 34,361 Reputation points Microsoft Employee
2023-07-18T14:23:30.82+00:00 Hi Bigiron ,
Thankyou for using Microsoft Q&A platform and thanks for posting your question here.
As I understand your query, you want to know how to cherry pick the tables and load the data of only specific tables from one DB to another using ADF pipeline. Please let me know if that is not the ask here.
You can create a control table in your source SQL DB, and use that for looking up for Active tables
In ADF, use lookup activity to select only those tablenames where isActive flag is 'Y' and use foreach activity to iterate through those tables and using copy activity, load the data from source to destination .
You can also incorporate the logic to perform full load/Incremental load for each of the tables based on the size of the data for each table. For example, if a table has small set of data, you can keep it as full load table and perform truncate and reload everytime. However, if there is huge set of data, then it's better to implement incremental load for those tables and perform upsert for changed data.
For reference, you can go through these videos in sequence, which walksthrough the same requirement in synapse analytics, however, it can be implemented in ADF too changing the sink to SQL DB instead of synapse.
How to do full load from On Premise SQL Server till ADLS
How to load latest and greatest data
How to perform incremental load
How to perform Upsert for Incremental records
Additionally, you can explore CDC feature in ADF which is the easiest and quickest way to handle incremental load in SQL. For more details, kindly check the below resources:
Change data capture in Azure Data Factory and Azure Synapse Analytics
CDC(change data capture) for SQL Source in Mapping data flows in Azure Data Factory or Synapse
CDC (change data capture) Resource in Azure Data Factory
Hope it helps. Kindly accept the answer and take the survey. Thankyou