I want to use this system within my work environment
by letting users put zip files into the Azure Blob Storage and use Azure Data Factory to automate the process of extracting/unzipping zip files everyday at the specific time.
I have followed the instruction from following YouTube tutorial.
"How to UnZip Multiple Files which are stored on Azure Blob Storage By using Azure Data Factory"
https://www.youtube.com/watch?v=TEtpvdnULZ8
I found this settings cause the system to extract every zip files (including zip files that has been extracted before) from the directory A to directory B every time it runs.
However, If the zip files have been extracted to the directory B before, it will extract again and replace existing files, which cause unnecessary process time given my situation that the directory A have 1000+ zip files and have around 10 zip files upload daily.
So I want to find a way to configure it to extract only new zip files that have not been extracted yet, while skipping/ignoring 1000+ (already extracted) zip files.
Thank you very much for checking out this question and feels free to provide any solution in your mind.
At the moment, I'm trying to find a way for Azure Data Factory to read the name of every folders in directory B (destination directory) to determine which zip file has been extracted or not, then skip the zip file that has been extracted.
Or another method in mind is by moving all zip files from directory A to somewhere else to avoid extracting old files.