Validate files in ADLS using Logic app or Azure function for invoking ADF pipeline

Karthikeyan Shivasankaran 21 Reputation points
2023-01-06T11:31:46.217+00:00

I get csv files in ADLS folder at different intervals and expected file list are available Azure SQL Table. how can I validate that all the files are arrived in adls using any of the below options
option1 - validate using logic app and invoke Azure Data factory.
option2 - validate using azure function and invoke Azure Data factory

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,539 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,224 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,602 Reputation points Microsoft Employee
    2023-02-08T19:56:37.2833333+00:00

    Hi Karthikeyan Shivasankaran,
    Thanks for question. I believe there is not direct or easy way to achieve it using ADF out of box features.
    Below are few workarounds you may try:

    1. Maybe you can write a custom code to check the file names in ADLS folder against your SQL table data to make sure that all files have arrived. This custom code can be executed using Custom activity in ADF. One the custom activity confirms all files are present then have a subsequent copy activity to copy all those files, If they don't have all the files you can skip the activity flow. You can use event triggers to trigger the pipeline when a new file arrives and pass the file names to your custom activity using GetMetada activity.
    2. Another work around is to use Event trigger to trigger your Parent pipeline which will have a GetMetadata activity which will get all the files names in that folder and next have a Lookup activity to get list of expected files form your SQL table and then do a condition check using dynamic expression to validate if the expected file names from SQL table are present in the Item list from the GetMetadata activity output and if the matches then have a child pipeline that will have a copy activity and copies all the expected files. That way you can monitored two pipelines independently and it will be easy for troubleshooting in any odd scenarios. The advantage of having 2 pipelines (parent (validates if all the expected files have arrived in source) and child (executes copy operation on expected files)) is you can make sure to trigger the Core/Main ADF pipeline on when all the expected files have arrived or landed in your source.

    I understand this is a tricky requirement, but if you want to avoid the custom code, then it would take multiple steps to implement this in ADF itself.

    Hope this helps.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.