problem with setting up a wildcard in copy activity in ADF

braxx 426 Reputation points
2022-10-07T10:19:04.547+00:00

I already have a pipeline which copies files from one blob storage to another.
It takes the specific location defined in the dataset: container/root folder/folderName and copies all the files with names ending with *data_file.parquet. That works and looks like this:

248486-11.png

In my new scenario I want to apply additional filter on files copied. To explain let’s use an example. Here is my folder structure on the blob storage:

Cointainer_name/MDB2B_v2/FB_AI_NA_nonAdd_Cntry_acc_to_ad_Cqrt/v1/current/NATIVEID=act_958164000888181/COMMON_DATE=2021-01-01/data_file.parquet

In the current scenario files are filtered to:
Container = Cointainer_name
Root folder = MDB2B_v2
FolderName = FB_AI_NA_nonAdd_Cntry_acc_to_ad_Cqrt
Files = *data_file.parquet

In the new scenario I want it still to be filtered for Container, Root folder, Folder but additionally only the files with names ending on *01-01/data_file.parquet
Container = Cointainer_name
Root folder = MDB2B_v2
FolderName = FB_AI_NA_nonAdd_Cntry_acc_to_ad_Cqrt
Files = *01-01/data_file.parquet

And here comes the problem. I do not know how to set it up in the wildcard properly.
I tried this:

248513-12.png

But during the copy no files meeting the condition were found and nothing was copied. I guess this is wrong as files are there for sure.

TIA

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,736 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. braxx 426 Reputation points
    2022-10-10T15:07:32.573+00:00

    Finally. I managed to resolved it. Here is the right syntax:

    wildcard path name:

    @ hide (concat('MDB2B_v2','/',pipeline().parameters.folderName,' */*/*/ ','*01-01'))

    file name:

    data_file.parquet

    1 person found this answer helpful.

  2. AnnuKumari-MSFT 31,721 Reputation points Microsoft Employee
    2022-10-07T11:09:35.963+00:00

    Hi @braxx ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding you are trying to copy data based on the wildcard condition in copy activity of ADF pipeline. Please let me know if my understanding is incorrect.

    First of all, when we use '/' in ADLS , the foldername gets separated into subfolders.

    For example: I can't have foldername like : 2022/09/13 . It simply means folder 2022 has subfolder named 09 which has another subfolder named 13 which contains the files underneath.

    248531-image.png

    Now , coming to your query, what I understand looking at the folder structure is : COMMON_DATE=2021-01-01 is the folder name which contains file named 'data_file.parquet' . Kindly share the screenshot of your storage account if that is not the case.

    If you want to copy only files starting 01-01 , you can try using */*/*/* in wildcard folderpath and 01-01*data_file.parquet in wildcard filepath.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators