How to use wildcard for XML Files in Copy Data

Jörg Lang 120 Reputation points
2024-06-19T09:19:37.5+00:00

I have a Azure Data Lake Storage Gen2 with following folder and file structure

  • .\source
  • .\source\SystemA_20240618.xml
  • .\source\SystemB_20240618.xml
  • .\source\SystemA_20240619.xml
  • .\source\SystemB_20240619.xml

I need to process only the files matching .\source\SystemA_*.xml within a single data flow/pipeline.

If I name the files within the data set configuration, I can process them, but I don't want to modify the dataset every day.

Please help.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,135 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 8,645 Reputation points Microsoft Vendor
    2024-06-19T09:55:42.7066667+00:00

    @Jörg Lang

    Thanks for the question and using MS Q&A platform.

    To process only the files matching the pattern .\\source\\SystemA_*.xml within a single data flow or pipeline without modifying the dataset configuration daily, you can use dynamic content in your data flow or pipeline configuration. Here’s how you can achieve this:

    Dynamic Content in Data Flow:

    • In your data flow, create a parameter (let’s call it SourceFilePath) that represents the folder path where your XML files are located (e.g., .\\source).
    • Use this parameter in your source dataset configuration. For example, if you’re using a File System source, set the folder path to @{dataset().SourceFilePath}.
    • In your data flow activities, use the SourceFilePath parameter to dynamically read files matching the pattern .\\source\\SystemA_*.xml.

    Dynamic Content in Pipeline:

    • Create a pipeline parameter (e.g., SourceFolderPath) representing the folder path where your XML files reside (e.g., .\\source).
    • In your pipeline, use this parameter in the ForEach activity to iterate over files matching the pattern .\\source\\SystemA_*.xml.
    • Inside the ForEach activity, configure your data flow activity to read the current file (using @item().name or @concat(parameters('SourceFolderPath'), '/', item().name)).

    By using dynamic content, you avoid hardcoding the file names and adapt to changes in the folder structure without modifying the dataset configuration daily

    Hope this helps. Do let us know if you any further queries.