Copying Excel files from location to another using ADF with datetime append

Syed Rashid Nizam 46 Reputation points
2022-05-17T04:18:28.437+00:00

In general copying files from one location to another is simple using Copy Activity of ADF. My problem statement is "I am trying to copy only excel files from Input folder to Archive folder, I would like to add, datetime stamp, otherwise if same file exists it overwrites the file"
Here is my pipeline
202574-image.png

Step 1: GetFileList get the files names correctly

202661-image.png

Step 2: Filter the files correctly as I am only interested in .xlsx files

202592-image.png

Step 3: For each filter files I would like to copy

202575-image.png

Step 4: Copy Activity "Source"

202662-image.png

Step 5: Copy Activity "Sink"

202583-image.png

"Sink DataSet config"

202643-image.png

expression "@markus.bohland@hotmail.de (replace(activity('FilterFiles').output,'.xlsx',''), '_', formatDateTime(convertTimeZone(utcnow(),'UTC','AUS Eastern Standard Time'),'yyyy-MM-ddTHHmmss'),'.xlsx')"

I get the validation check "Binary copy does not support copying from folder to file" All I am trying to copy the filer excel files to Archive folder with datetime stamp

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,036 Reputation points
    2022-05-17T18:00:49.243+00:00

    Hello @Syed Rashid Nizam ,
    Thanks for the question and using MS Q&A platform.

    I see you are having difficulty with your pipeline implementation. I see multiple causes, which I will try to address below.
    There is one point I was unsure about. Did you want to append the datetime to the name of the file, or to the data of the file?

    I think I see the problem. The activities inside the ForEach loop, the Copy and Delete -- you are trying to refer to the output of the activities outside the loop. Specifically the Filter output. You should be using @item() instead. When inside a loop, @item() refers to the current iteration's element. A piece of what is specified in the ForEach loop's Items property. Just like how it worked in the Filter activity.

    Furthermore, in the Copy Activity, you are specifying a wildcard file path, which would pick multiple files. This is being run once for each filename in the ForEach loop. This means you are copying the same files over and over again. I'm sure this is not what you intended.

    Also I notice the source Dataset has a parameter, which I expect is for filename. This way of specifying filename conflicts with or is overridden by your choice of "Wildcard file path" in the Copy Activity. If you want to use the dataset parameter to specify filename, use the option "File path in dataset". The parameter value then should be @item().name like in the Filter activity.

    If you want to append the datetime to the name of the sink file, then the sink dataset should be the one with the parameter. The source dataset can still have parameter if you want.
    To go from "myFile.xlsx" to "myFile20220517.xlsx" there are several steps.

    We want to separate "myFile" from the ".xlsx" so we can insert the date between them.

    @split(@item().name, ',')  
    
    "myFile.xlsx" => ["myFile", "xlsx"]  
    

    It is @item().name and not just @item() because that's how it was done in the filter activity.

    Next we want to get and maybe format the date or time. I'm assuming you want the time it was copied in UTC.

    @formatDateTime( utcnow(), "yyyyMMdd")  
    
    May 17 2022 13:50 => 20220517  
    

    and of course put it all together again. Since you only wanted to do .xlsx files, I'm going to take a shortcut.

    @concat(  
        split( item().name ) [0] ,  
        formatDateTime( utcnow() , "yyyyMMdd" ) ,  
        '.xlsx'  
    )  
    

    I haven't tested the code yet, but I'm pretty sure I'm close. I might have mixed up " and '.

    If you wanted to put the datetime in an additional column of data, you can't do it in Binary dataset. If it can be done, it would be in the excel dataset type. In copy activity under source > "additional columns".

    Please do let me if you have any queries.

    Thanks
    Martin


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators