Copy files based on date in filename [ADF]

Sorrow In Yellow 66 Reputation points
2021-04-21T12:29:53.753+00:00

Hi,

I want to copy and load files in an Azure SQL DB from 3 CSV files.
The files are named:
AA_20210101_052218.csv
AA_20210102_052218.csv
AA_20210103_052218.csv

The rule is to copy and load only the file with maximum date, which is in this case AA_20210103_052218.csv
I want to know how can this be done using azure data factory

Thank you !

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,305 questions
No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 33,136 Reputation points Microsoft Employee
    2021-04-21T23:55:30.55+00:00

    Hi @Sorrow In Yellow ,

    Thanks for reaching out.

    You can follow a similar approach explained in this thread: ADF: copy last modified blob

    But in your case you have the date part in your file name, so to retrieve the date part you will have to use a split function to split your file name to ["AA","20210103","052218.csv"] then only pick the index 1 which will only return date part. Sample expression - @string(split(item(), '_')[1])

    I have tested it and works fine. For input I used an array parameter which passes the all the file names for testing, but if your files are in a container, you can use GetMetadata to get the filename and pass to a ForEach activity.

    90070-image.png

    You will have to declare few variables which will be used to compare the file date value with a reference date value and also to capture the relevant file name which you will be using in Copy activity after For Each activity as shown below:

    90077-image.png

    Here is the GIF:

    90049-splitfilenamecopylatestdate.gif

    After ForEach activity I have used a setVarible activity for demonstration but instead you can use a copy activity and pass the fileName variable value in file path settings of your Copy source.

    Another option is to use an Azure Function and write your own logic for sorting the most recent file using the date value as explained in this demonstration by a community expert: Azure Data Factory - Copy the latest file with DATE PART to Folder using AZURE FUNCTIONS

    Hope this helps. Do let us know if you have further query.

    ----------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.


1 additional answer

Sort by: Most helpful
  1. Naami.Ayman 331 Reputation points
    2021-04-23T11:56:06.89+00:00

    I found out a solution for this.
    I created a stored procedure to sort my files.

    Here it is:

    CREATE PROCEDURE dtm.sortcsvfiles
         @JsonFile     NVARCHAR(MAX)
       , @Debug    INT           = 0
    AS
        BEGIN
    
            DECLARE 
                 @sqlQ NVARCHAR(MAX) = 'SELECT *
    FROM OPENJSON(@JsonFile) WITH(name NVARCHAR(200) ''$.name'')
    ORDER BY left(right(name,19),8)' 
            IF @Debug = 0
                BEGIN
                    EXEC sp_executesql 
                       @sqlTransform
                     , N'@JsonFile nvarchar(max)'
                     , @jsonFile = @jsonFile
            END
                ELSE
                BEGIN
                    SELECT @sqlQ
            END
        END
    

    Then I created a lookup activity using this sp, to take getmetadata.ChildItems as an input (JsonFile), and give sorted files as an output. I used a forEach loop to load files based on the sort order in lookup activity.

    I hope this is helpful.

    Thank you

    No comments