Choosing an Approach for Incremental Loading with Watermark in Azure Data Factory: Efficiency and Cost Considerations

Question

Choosing an Approach for Incremental Loading with Watermark in Azure Data Factory: Efficiency and Cost Considerations

Lucas Medina 20

Hi all, I'm working on implementing an Azure Data Factory pipeline for incremental data loading using a watermark table approach. I have identified two different approaches but am unsure which one is considered the best practice in terms of cost-effectiveness and efficiency. Approach One: This approach is detailed in a YouTube video here: https://www.youtube.com/watch?v=t1kWzdAP3kk Approach Two: The second approach is outlined in a Stack Overflow answer here: How to Perform Incremental Historical Data Load from SQL On-Premise to Azure Blob Storage with Dynamic Folder Structure? My Question: Which of these approaches is generally considered best practice for incremental loading in Azure Data Factory when considering both cost-effectiveness and efficiency? Are there any significant pros or cons to each method that I should be aware of? Any insights or experiences with either of these methods would be greatly appreciated.

Vinodh247 34,661 Reputation points MVP Volunteer Moderator

2024-01-13T13:45:09.22+00:00

Hi Lucas Medina: Thanks for reaching out to Microsoft Q&A. Both of the approaches are somewhat similar and does the job. I would lean towards Aswin's answer on stackoverflow which looks a bit good over the other. There are no pros or cons as these are the straightforward and practically implemented method in terms of incremental copy using ADF.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Harishga 6,000 Reputation points Microsoft External Staff

2024-01-18T04:15:41.2133333+00:00

Hi Lucas Medina
Just checking in to see if the below answer helped. If this answers your query, do click "Accept the answer” for the same, which might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

1 answer

Your answer

Vinodh247 34,661 Reputation points MVP Volunteer Moderator

2024-01-13T13:45:09.22+00:00

Hi Lucas Medina: Thanks for reaching out to Microsoft Q&A. Both of the approaches are somewhat similar and does the job. I would lean towards Aswin's answer on stackoverflow which looks a bit good over the other. There are no pros or cons as these are the straightforward and practically implemented method in terms of incremental copy using ADF.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Harishga 6,000 Reputation points Microsoft External Staff

2024-01-18T04:15:41.2133333+00:00

Hi Lucas Medina
Just checking in to see if the below answer helped. If this answers your query, do click "Accept the answer” for the same, which might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

Answer 1

Hi Lucas Medina

Welcome to Microsoft Q&A platform and thanks for posting your question here.

Approach One, as detailed in the YouTube video, involves using a Lookup activity to retrieve the maximum value of the watermark column from the destination table and then using a Copy activity to copy only the new or updated rows from the source table to the destination table. This approach is relatively simple and easy to implement, but it may not be the most cost-effective or efficient approach, especially if the destination table is large and has many partitions.

Approach Two, as outlined in the Stack Overflow answer, involves using a stored procedure to retrieve the new or updated rows from the source table and then using a Copy activity to copy them to the destination table. This approach allows for more control over the data retrieval process and can be more efficient and cost-effective, especially for large tables with many partitions. However, it requires more setup and configuration, including creating a stored procedure and setting up a linked service to connect to the on-premises SQL Server.

In terms of cost-effectiveness and efficiency, Approach Two may be the better option for large tables with many partitions, as it allows for more control over the data retrieval process and can be more efficient. However, for smaller tables or tables with fewer partitions, Approach One may be sufficient and easier to implement.

It's also worth noting that there may be other approaches or variations of these approaches that could be more suitable for specific scenarios. It's important to consider the specific requirements and constraints of the project when deciding on the best approach for incremental loading in Azure Data Factory.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Harishga 6,000 Reputation points Microsoft External Staff

2024-01-17T08:21:35.05+00:00

Hi Lucas Medina
Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
AbhishekKrishna-4295 0 Reputation points

2024-05-20T12:44:37.1966667+00:00

Apologies, I may not have asked my question in the right channel. I will post it again.

Share via

Choosing an Approach for Incremental Loading with Watermark in Azure Data Factory: Efficiency and Cost Considerations

1 answer

Your answer