question

SyedRashidNizam-1883 avatar image
0 Votes"
SyedRashidNizam-1883 asked MartinJaffer-MSFT commented

Copying Excel files from location to another using ADF with datetime append

In general copying files from one location to another is simple using Copy Activity of ADF. My problem statement is "I am trying to copy only excel files from Input folder to Archive folder, I would like to add, datetime stamp, otherwise if same file exists it overwrites the file"
Here is my pipeline
202574-image.png


Step 1: GetFileList get the files names correctly

202661-image.png

Step 2: Filter the files correctly as I am only interested in .xlsx files

202592-image.png

Step 3: For each filter files I would like to copy

202575-image.png

Step 4: Copy Activity "Source"

202662-image.png

Step 5: Copy Activity "Sink"

202583-image.png

"Sink DataSet config"

202643-image.png

expression "@concat(replace(activity('FilterFiles').output,'.xlsx',''), '_', formatDateTime(convertTimeZone(utcnow(),'UTC','AUS Eastern Standard Time'),'yyyy-MM-ddTHHmmss'),'.xlsx')"

I get the validation check "Binary copy does not support copying from folder to file" All I am trying to copy the filer excel files to Archive folder with datetime stamp

azure-data-factory
image.png (24.4 KiB)
image.png (41.6 KiB)
image.png (36.0 KiB)
image.png (42.8 KiB)
image.png (57.7 KiB)
image.png (45.1 KiB)
image.png (37.7 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @SyedRashidNizam-1883,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond back with the more details and we will try to help.

Thanks
Martin

0 Votes 0 ·

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered MartinJaffer-MSFT commented

Hello @SyedRashidNizam-1883,
Thanks for the question and using MS Q&A platform.

I see you are having difficulty with your pipeline implementation. I see multiple causes, which I will try to address below.
There is one point I was unsure about. Did you want to append the datetime to the name of the file, or to the data of the file?

I think I see the problem. The activities inside the ForEach loop, the Copy and Delete -- you are trying to refer to the output of the activities outside the loop. Specifically the Filter output. You should be using @item() instead. When inside a loop, @item() refers to the current iteration's element. A piece of what is specified in the ForEach loop's Items property. Just like how it worked in the Filter activity.

Furthermore, in the Copy Activity, you are specifying a wildcard file path, which would pick multiple files. This is being run once for each filename in the ForEach loop. This means you are copying the same files over and over again. I'm sure this is not what you intended.

Also I notice the source Dataset has a parameter, which I expect is for filename. This way of specifying filename conflicts with or is overridden by your choice of "Wildcard file path" in the Copy Activity. If you want to use the dataset parameter to specify filename, use the option "File path in dataset". The parameter value then should be @item().name like in the Filter activity.

If you want to append the datetime to the name of the sink file, then the sink dataset should be the one with the parameter. The source dataset can still have parameter if you want.
To go from "myFile.xlsx" to "myFile20220517.xlsx" there are several steps.

We want to separate "myFile" from the ".xlsx" so we can insert the date between them.

 @split(@item().name, ',')
    
 "myFile.xlsx" => ["myFile", "xlsx"]

It is @item().name and not just @item() because that's how it was done in the filter activity.

Next we want to get and maybe format the date or time. I'm assuming you want the time it was copied in UTC.

 @formatDateTime( utcnow(), "yyyyMMdd")
    
 May 17 2022 13:50 => 20220517

and of course put it all together again. Since you only wanted to do .xlsx files, I'm going to take a shortcut.

 @concat(
     split( item().name ) [0] ,
     formatDateTime( utcnow() , "yyyyMMdd" ) ,
     '.xlsx'
 )

I haven't tested the code yet, but I'm pretty sure I'm close. I might have mixed up " and '.

If you wanted to put the datetime in an additional column of data, you can't do it in Binary dataset. If it can be done, it would be in the excel dataset type. In copy activity under source > "additional columns".

Please do let me if you have any queries.

Thanks
Martin


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators






· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for your response, that's correct, I would like to append datetime in the file name, "Myfile_FEB2022.xlsx" to "Myfile_FEB2022_20220101.xlsx".
I made few changing in copy activity (inside the loop) as you suggested, screen shot below, I still don't understand why copy activity copying both files (".txt" and ".xlslx", despite filter is bringing only one .xlsx file and passing only item count 1 "for each activity", I made the change @item().name, shouldn't this only bring 1 file?
202962-copyact-src.png


New Sink
202953-image.png

Expression "@concat(replace(item().name,'.xlsx',''), '_', formatDateTime(convertTimeZone(utcnow(),'UTC','AUS Eastern Standard Time'),'yyyy-MM-ddTHHmmss'),'.xlsx')"


0 Votes 0 ·
copyact-src.png (38.4 KiB)
image.png (44.3 KiB)
MartinJaffer-MSFT avatar image MartinJaffer-MSFT SyedRashidNizam-1883 ·

Yes, @SyedRashidNizam-1883 , it should be only bringing one file. I'd do a debug run and check the input and output of each activity. Or use Set Variables to force a display in the output. That is especially useful for troubleshooting expressions.

Next thing to check is the dataset, is it using the parameter correctly? If you forgot to put the @dataset.pFileName into the file portion of the filepath, all would be explained.

Everything else looks fine. Might you be doing "Trigger Now" and not have published the last changes? Trigger now works on the last published version. Debug run works on whatever your browser is showing.

Also good job on noticing that shortcut, replacing the .xlsx instead of doing split.

0 Votes 0 ·

@MartinJaffer-MSFT : thought to tag you as well, much appreciate your help to understand what's going wrong here.

0 Votes 0 ·