Copy Data activity Fails to fetch latest file from ADLS

Sarvesh Pandey 71 Reputation points
2024-03-05T01:20:01.42+00:00

Hi All,

I am using ADF to get Latest file from ADLS but my copy data activity is failing as its need something the wildcard path either a file name or *.

This is quite confusing I can predict which file would be the latest and I don't want all the file to get copied.

Error - ErrorCode=FormatBasedDatasetMissingFileName,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=One of the property "file name" or "wildcard file name" is required. If you want to copy all the files from a folder instead of copy single file, you can put "*" in your wildcard file name,Source=Microsoft.DataTransfer.ClientLibrary,'

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,458 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,521 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 10,010 Reputation points Microsoft Vendor
    2024-03-05T07:14:47.7266667+00:00

    @Sarvesh Pandey

    Thanks for reaching out to Microsoft Q&A.

    I understand that you’re trying to use Azure Data Factory (ADF) to copy the latest file from Azure Data Lake Storage (ADLS), but you’re encountering an error because the copy data activity requires a file name or a wildcard path. Here’s a possible solution:

    You can create a pipeline in ADF to get the latest file from a folder in ADLS.his involves using the GetMetadata and If Condition activities within a ForEach loop.

    Here’s a step-by-step guide:

    You could set modifiedDatetimeStart and modifiedDatetimeEnd to filter the files in the folder when you use ADLS connector in copy activity.

    Maybe it has two situations:

    1.The data was pushed by external source in the schedule,you are suppose to know the schedule time to configure.

    2.The frequency is random,then maybe you have to log the pushing data time in another residence,then pass the time as parameter into copy activity pipeline before you execute it.


    I try to provide a flow for you in ADF pipelines as below:

    My sample files in same folder:

    enter image description here

    Step1,create two variables, maxtime and filename:

    maxtime is the critical datetime of specific date, filename is empty string.

    enter image description here

    Step2, use GetMetadata Activity and ForEach Activity to get the files under folder.

    enter image description here

    GetMetadata 1 configuration:

    enter image description here

    ForEach Activity configuration:

    enter image description here

    Step3: Inside ForEach Activity,use GetMetadata and If-Condition, the structure as below:

    enter image description here

    GetMetadata 2 configuration:

    enter image description here

    If-Condition Activity configuration:

    enter image description here

    Step4: Inside If-Condition True branch,use Set Variable Activity:

    enter image description here

    Set variable1 configuration:

    enter image description here

    Set variable2 configuration:

    enter image description here

    All of above steps aim to finding the latest fileName, the variable fileName is exactly target.


    Addition for another new dataset in GetMetadata 2

    enter image description here

    Method 2: Using Azure Functions

    Method 2: Using Azure Functions

    1. Develop an Azure Function:
    • Create an Azure Function triggered by a blob change event in your ADLS folder.
    • Inside the function, access the blob metadata (including "LastModified") to identify the latest file.
    • Use the Azure Data Factory integration features within the function to trigger a Copy Data activity specifically for the latest file.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.