Hello @Arbi Nazarian ,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is how to design the ETL which helps to capture/implement incremental load , please do let us know if its not accurate.
As per me using delta file for track the changes is the correct approach .
You asked :
How can I implement the incremental load feature that is offered in Azure and Delta Files?
Do I need to do this in a notebook, or can I do it in the Synapse UI? (Using Autoloader or Copy into?)
You have two options here
Options #1
Use mapping data flow ( MDF ) : MDF is used mostly for transformation . You can read the data from the two files , join them and then use function like CRC32 to have the fingerprint of the row of the incoming files . Do a similar thing on the past file and compare the fongerprints and determine the delta .
Option #2
You also use Synapse Analytics also ,, it offers something called spark pool ( whcih runs on Apache Spark ) and you can read the file and join the dataframes and use the hash function to determine the delta .
Are there any special requirements that I must be aware of?
Since I have not seen the data , i will say on a high level this should be fine .
Can I use partitioning and use the year/month/date key partition technique to use in my incremental load? Not sure if partitions are to be used for incremental loading (our data is not that
big so no real need for partitioning, but would be helpful down the line)
If you have a columns in the existing dataset , whcih always gets updated on INSERT and UPDATE , please go for this options .
Also just curious how are you planning to handle the deleted records on the source ?
Please do let me if you have any queries.
Thanks
Himanshu
- Please don't forget to click on
or upvote
button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
- Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators