Hi jason2500 ,
Welcome to Microsoft Q&A platform and thanks for posting your query here.
As per my understanding, you are trying to copy data from on prem sql server to snowflake Datawarehouse using ADF pipeline and you want to know the best practice to achieve this . Please let me know if that is not the ask.
I understand that full load technique would take a lot of time since truncating and loading the entire set of records is only recommended for small set of data load. However, in your case, you may need to consider using incremental load technique.
There are two possible ways to achieve this.
- Create an end to end incremental load pipeline
- Use CDC (Change data capture) approach
However CDC also uses dataflow in the background , so for your case, you may opt for first approach.
To create incremental load pipeline, you would require to rely on 'Benchmark' or 'Watermark' column which would store what is the lastupdated date value for that particular record. You can write the query to load only the records from the source whose benchmarkvalue(onprem sql) >lastupdateddate in the sink(snowflake) .
To know more about the implementation, kindly checkout the below resources:
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview
https://www.youtube.com/watch?v=FXw1gPaa2-M&list=PLsJW07-_K61JkuvlNfCK9gTEft_N8MVsP&index=9
https://www.youtube.com/watch?v=IPpYDQ0QE_E&list=PLsJW07-_K61JkuvlNfCK9gTEft_N8MVsP&index=11
To implement CDC, you can check the below resources:
https://learn.microsoft.com/en-us/azure/data-factory/concepts-change-data-capture
https://www.youtube.com/watch?v=-NcyGDgiTEo
Hope it helps. Kindly accept the answer by clicking on Accept answer
button. Thankyou