how to copy files from file system to adlsgen2 in ADF

Tinashe Chinyati 221 Reputation points
2020-12-06T19:09:09.083+00:00

Greetings
I am new to ADF. I have an on-premise storage that receives small files every min and I would like to copy these files based on the timestamp in the filename, then sink the respective file to the folder in adlsgen2 e.g. file format is H_ODG_20201206_213412_00.CSV and should be sinked in Year=2020, Month=12 and Day=06 etc, since there will be a lot of files with diff dates I want to be able to create a tumbling window trigger that filters what to copy based on that file name (timestamp). Thanks for your help

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,471 questions
0 comments No comments
{count} votes

Accepted answer
  1. Tinashe Chinyati 221 Reputation points
    2020-12-18T00:28:05.677+00:00

    Greetings @HimanshuSinha-msft
    I was able to work out the solution and apologies for the delay. I am now using 2 copy activities with dataset as binary. So since our files keep on accumulating in the source folder I was allowed to copy and delete the source files while copying them to another file system storage creating a date partition. It reads the datepart in the filename Q_ODP_20201218_2334_00.CSV. So if we want to load historic data we specify using the TWT. The sink path can be adjusted accordingly depending on the spec in this case its the same. The following was my resolution and seems to work. Thanks49259-twt-parameters.png49371-adls-copy.png49313-copy-and-delete-on-prem.png49273-sink-partition-binary.png

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2020-12-08T04:32:30.303+00:00

    Hello @Tinashe Chinyati ,

    Thanks for the ask and using the forum .
    There are two system variables exposed by the TWT to capture the start and endtime . Please do read about that here .

    And in the If activity with the below expression and I think thats all you need .
    @ANDO (lessOrEquals(int(variables('name')),int(formatDateTime(trigger().outputs.windowStartTime,'yyyyMMdd'))),greaterOrEquals(int(variables('name')),int(formatDateTime(trigger().outputs.windowEndTime,'yyyyMMdd'))))

    Let me know how it goes .

    Thanks
    Himanshu