Ideas to generate "list of files" from ADLS gen 2 (csv files) for ADF copy data activity

Saru Thiagarajan 31 Reputation points
2021-01-27T23:27:54.953+00:00

Data Factory/Synapse copy data activity source has a feature to point to a text file that lists each file that we want to copy to the sink. The functionality works great but I'm breaking my head as to how I can generate that text-file in the first place using the files in the blob storage. It worked great because I created the file list manually and uploaded to the blob but that ain't going to work in end-2-end flow.

In the past, I've written shell script to generate the file-list and executed it before the session/mapping that does the actual load to staging tables etc (you know which ETL tool I'm talking about) but how can we do it in the Azure ADF landscape?

Thinking of leveraging get metadata activity on the container, looping through each and inserting into a database. Then having a stored proc to group them into respective "file-list" but how can I make ADF create a blob storage file with list of files in that? Another option is to merge all files using the same metadata activity but this seems to me like a simple feature and I don't mean to beat a dead horse, but I still don't have a clear design path for this.

Any guidance is greatly appreciated. It seems to me like a simple feature.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,334 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,471 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,126 Reputation points
    2021-01-28T09:24:37.457+00:00

    Hi @Saru Thiagarajan ,

    Welcome to Microsoft Q&A Platform. Thanks for posting the query.

    One approach I can think is as below using array and string variables to store the file names which can be later copied into a blob file using copy activity.

    61363-filenameslistblob.gif

    Please let us know for further queries and we will be glad to assist.

    --

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.

  2. Saru Thiagarajan 31 Reputation points
    2021-04-02T20:48:30.097+00:00

    My sincere apologies for the delayed response.
    I went with a dataflow. The sink was of type ADLS gen2 with schema drift enabled and on the settings, I choose the file name option to be "Name file as a column data".

    0 comments No comments